diff --git a/.gitignore b/.gitignore
index 681f689e..2bc52715 100644
--- a/.gitignore
+++ b/.gitignore
@@ -74,6 +74,9 @@ docs/_build/
# PyBuilder
target/
+# Exception for dbt tests
+!tests/dbt_artifacts/target
+
# Jupyter Notebook
.ipynb_checkpoints
diff --git a/README.md b/README.md
index 01491a25..245cced7 100644
--- a/README.md
+++ b/README.md
@@ -2,160 +2,108 @@
-# **data-diff**
+
+data-diff
+
+
+
+Develop dbt models faster by testing as you code.
+
+
+See how every change to dbt code affects the data produced in the modified model and downstream.
+
+
## What is `data-diff`?
-data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale.
-## Documentation
+data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code.
-[**🗎 Documentation website**](https://docs.datafold.com/os_diff/about) - our detailed documentation has everything you need to start diffing.
+
-### Databases we support
+
-- PostgreSQL >=10
-- MySQL
-- Snowflake
-- BigQuery
-- Redshift
-- Oracle
-- Presto
-- Databricks
-- Trino
-- Clickhouse
-- Vertica
-- DuckDB >=0.6
-- SQLite (coming soon)
+
-For their corresponding connection strings, check out our [detailed table](https://docs.datafold.com/os_diff/databases_we_support).
+
-#### Looking for a database not on the list?
-If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list.
+## Getting Started
-## Use cases
+**Install `data-diff`**
-### Diff Tables Between Databases
-#### Quickly identify issues when moving data between databases
-
-
-
-
-
-### Diff Tables Within a Database
-#### Improve code reviews by identifying data problems you don't have tests for
-
-
-
-
-
-
-
-
-
-## Get started
-
-### Installation
-
-#### First, install `data-diff` using `pip`.
+Install `data-diff` with the command that is specific to the database you use with dbt.
+### Snowflake
```
-pip install data-diff
+pip install data-diff 'data-diff[snowflake,dbt]' -U
```
-#### Then, install one or more driver(s) specific to the database(s) you want to connect to.
-
-- `pip install 'data-diff[mysql]'`
-
-- `pip install 'data-diff[postgresql]'`
-
-- `pip install 'data-diff[snowflake]'`
-
-- `pip install 'data-diff[presto]'`
-
-- `pip install 'data-diff[oracle]'`
-
-- `pip install 'data-diff[trino]'`
-
-- `pip install 'data-diff[clickhouse]'`
-
-- `pip install 'data-diff[vertica]'`
-
-- For BigQuery, see: https://pypi.org/project/google-cloud-bigquery/
-
-_Some drivers have dependencies that cannot be installed using `pip` and still need to be installed manually._
-
-### Run your first diff
-
-Once you've installed `data-diff`, you can run it from the command line.
-
+### BigQuery
```
-data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
+pip install data-diff 'data-diff[dbt]' google-cloud-bigquery -U
```
-Be sure to read [the docs](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line) for detailed instructions how to build one of these commands depending on your database setup.
-
-#### Code Example: Diff Tables Between Databases
-Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres.
+### Redshift
+```
+pip install data-diff 'data-diff[redshift,dbt]' -U
+```
+### Postgres
```
-data-diff \
- postgresql://:''@localhost:5432/ \
- \
- "snowflake://:@//?warehouse=&role=" \
- \
- -k activity_id \
- -c activity \
- -w "event_timestamp < '2022-10-10'"
+pip install data-diff 'data-diff[postgres,dbt]' -U
```
-#### Code Example: Diff Tables Within a Database
+### Databricks
+```
+pip install data-diff 'data-diff[databricks,dbt]' -U
+```
-Here's a code example from [the video](https://www.loom.com/share/682e4b7d74e84eb4824b983311f0a3b2), where we compare data between two Snowflake tables within one database.
+### DuckDB
+```
+pip install data-diff 'data-diff[duckdb,dbt]' -U
+```
+**Update a few lines in your `dbt_project.yml`**.
```
-data-diff \
- "snowflake://:@//?warehouse=&role=" \
- . \
- -k org_id \
- -c created_at -c is_internal \
- -w "org_id != 1 and org_id < 2000" \
- -m test_results_%t \
- --materialize-all-rows \
- --table-write-limit 10000
+#dbt_project.yml
+vars:
+ data_diff:
+ prod_database: my_database
+ prod_schema: my_default_schema
```
-In both code examples, I've used `<>` carrots to represent values that **should be replaced with your values** in the database connection strings. For the flags (`-k`, `-c`, etc.), I opted for "real" values (`org_id`, `is_internal`) to give you a more realistic view of what your command will look like.
+**Run your first data diff!**
-### We're here to help!
+```
+dbt run && data-diff --dbt
+```
-We know that in some cases, the data-diff command can become long and dense. And maybe you're new to the command line.
+We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details.
-* We're here to help [on slack](https://locallyoptimistic.slack.com/archives/C03HUNGQV0S) if you have ANY questions as you use `data-diff` in your workflow.
-* You can also post a question in [GitHub Discussions](https://github.com/datafold/data-diff/discussions).
+Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started!
+
-To get a Slack invite - [click here](https://locallyoptimistic.com/community/)
+### Diffing between databases
-## How to Use
+Check out our [documentation](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md) if you're looking to compare data across databases (for example, between Postgres and Snowflake).
-* [How to use from the shell (or: command-line)](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line)
-* [How to use from Python](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_python)
-* [How to use with TOML configuration file](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_toml)
-* [Usage Analytics & Data Privacy](https://docs.datafold.com/os_diff/usage_analytics_data_privacy)
+
-## How to Contribute
-* Feel free to open an issue or contribute to the project by working on an existing issue.
-* Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started.
+## Contributors
-Big thanks to everyone who contributed so far:
+We thank everyone who contributed so far!
-## Technical Explanation
+
+
+## Analytics
+
+* [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md)
-Check out this [technical explanation](https://docs.datafold.com/os_diff/technical_explanation) of how data-diff works.
+
## License
diff --git a/data_diff/__init__.py b/data_diff/__init__.py
index 38e13760..0a3a9d71 100644
--- a/data_diff/__init__.py
+++ b/data_diff/__init__.py
@@ -1,6 +1,6 @@
from typing import Sequence, Tuple, Iterator, Optional, Union
-from sqeleton.abcs import DbTime, DbPath
+from data_diff.sqeleton.abcs import DbTime, DbPath
from .tracking import disable_tracking
from .databases import connect
diff --git a/data_diff/__main__.py b/data_diff/__main__.py
index 96602d25..383fb88d 100644
--- a/data_diff/__main__.py
+++ b/data_diff/__main__.py
@@ -10,8 +10,8 @@
import rich
import click
-from sqeleton.schema import create_schema
-from sqeleton.queries.api import current_timestamp
+from data_diff.sqeleton.schema import create_schema
+from data_diff.sqeleton.queries.api import current_timestamp
from .dbt import dbt_diff
from .utils import eval_name_template, remove_password_from_url, safezip, match_like
@@ -228,6 +228,13 @@ def write_usage(self, prog: str, args: str = "", prefix: Optional[str] = None) -
metavar="PATH",
help="Which directory to look in for the dbt_project.yml file. Default is the current working directory and its parents.",
)
+@click.option(
+ "--select",
+ "-s",
+ default=None,
+ metavar="PATH",
+ help="select dbt resources to compare using dbt selection syntax",
+)
def main(conf, run, **kw):
if kw["table2"] is None and kw["database2"]:
# Use the "database table table" form
@@ -264,6 +271,7 @@ def main(conf, run, **kw):
profiles_dir_override=kw["dbt_profiles_dir"],
project_dir_override=kw["dbt_project_dir"],
is_cloud=kw["cloud"],
+ dbt_selection=kw["select"],
)
else:
return _data_diff(**kw)
@@ -306,6 +314,7 @@ def _data_diff(
cloud,
dbt_profiles_dir,
dbt_project_dir,
+ select,
threads1=None,
threads2=None,
__conf__=None,
diff --git a/data_diff/cloud/__init__.py b/data_diff/cloud/__init__.py
new file mode 100644
index 00000000..5893496d
--- /dev/null
+++ b/data_diff/cloud/__init__.py
@@ -0,0 +1,2 @@
+from .datafold_api import DatafoldAPI, TCloudApiDataDiff
+from .data_source import get_or_create_data_source
diff --git a/data_diff/cloud/data_source.py b/data_diff/cloud/data_source.py
new file mode 100644
index 00000000..05331c01
--- /dev/null
+++ b/data_diff/cloud/data_source.py
@@ -0,0 +1,321 @@
+import json
+import time
+from typing import List, Optional, Union, overload
+
+import pydantic
+import rich
+from rich.table import Table
+from rich.prompt import Confirm, Prompt, FloatPrompt, IntPrompt, InvalidResponse
+from typing_extensions import Literal
+
+from .datafold_api import (
+ DatafoldAPI,
+ TCloudApiDataSourceConfigSchema,
+ TCloudApiDataSource,
+ TDsConfig,
+ TestDataSourceStatus,
+)
+from ..dbt_parser import DbtParser
+
+
+UNKNOWN_VALUE = "unknown_value"
+
+
+class TDataSourceTestStage(pydantic.BaseModel):
+ name: str
+ status: TestDataSourceStatus
+ description: str = ""
+
+
+class TemporarySchemaPrompt(Prompt):
+ response_type = str
+
+ def process_response(self, value: str) -> str:
+ """Convert choices to a bool."""
+
+ if len(value.split(".")) != 2:
+ raise InvalidResponse("Temporary schema should have a format .")
+ return value
+
+
+class ValueRequiredPrompt(Prompt):
+ def process_response(self, value: str) -> str:
+ value = super().process_response(value)
+ if value == UNKNOWN_VALUE or value is None or value == "":
+ raise InvalidResponse("Parameter must not be empty")
+ return value
+
+
+def _validate_temp_schema(temp_schema: str):
+ if len(temp_schema.split(".")) != 2:
+ raise ValueError("Temporary schema should have a format .")
+
+
+def _get_temp_schema(dbt_parser: DbtParser, db_type: str) -> Optional[str]:
+ diff_vars = dbt_parser.get_datadiff_variables()
+ config_prod_database = diff_vars.get("prod_database")
+ config_prod_schema = diff_vars.get("prod_schema")
+ if config_prod_database is not None and config_prod_schema is not None:
+ temp_schema = f"{config_prod_database}.{config_prod_schema}"
+ if db_type == "snowflake":
+ return temp_schema.upper()
+ elif db_type in {"pg", "postgres_aurora", "postgres_aws_rds", "redshift"}:
+ return temp_schema.lower()
+ return temp_schema
+ return
+
+
+def create_ds_config(
+ ds_config: TCloudApiDataSourceConfigSchema,
+ data_source_name: str,
+ dbt_parser: Optional[DbtParser] = None,
+) -> TDsConfig:
+ options = _parse_ds_credentials(ds_config=ds_config, only_basic_settings=True, dbt_parser=dbt_parser)
+
+ temp_schema = _get_temp_schema(dbt_parser=dbt_parser, db_type=ds_config.db_type) if dbt_parser else None
+ if temp_schema:
+ temp_schema = TemporarySchemaPrompt.ask("Temporary schema", default=temp_schema)
+ else:
+ temp_schema = TemporarySchemaPrompt.ask("Temporary schema (.)")
+
+ float_tolerance = FloatPrompt.ask("Float tolerance", default=0.000001)
+
+ return TDsConfig(
+ name=data_source_name,
+ type=ds_config.db_type,
+ temp_schema=temp_schema,
+ float_tolerance=float_tolerance,
+ options=options,
+ )
+
+
+@overload
+def _cast_value(value: str, type_: Literal["integer"]) -> int:
+ ...
+
+
+@overload
+def _cast_value(value: str, type_: Literal["boolean"]) -> bool:
+ ...
+
+
+@overload
+def _cast_value(value: str, type_: Literal["string"]) -> str:
+ ...
+
+
+def _cast_value(value: str, type_: str) -> Union[bool, int, str]:
+ if type_ == "integer":
+ return int(value)
+ elif type_ == "boolean":
+ return bool(value)
+ return value
+
+
+def _get_data_from_bigquery_json(path: str):
+ with open(path, "r") as file:
+ return json.load(file)
+
+
+def _align_dbt_cred_params_with_datafold_params(dbt_creds: dict) -> dict:
+ db_type = dbt_creds["type"]
+ if db_type == "bigquery":
+ method = dbt_creds["method"]
+ if method == "service-account":
+ data = _get_data_from_bigquery_json(path=dbt_creds["keyfile"])
+ dbt_creds["jsonKeyFile"] = json.dumps(data)
+ elif method == "service-account-json":
+ dbt_creds["jsonKeyFile"] = json.dumps(dbt_creds["keyfile_json"])
+ else:
+ rich.print(
+ f'[red]Cannot extract bigquery credentials from dbt_project.yml for "{method}" type. '
+ f"If you want to provide credentials via dbt_project.yml, "
+ f'please, use "service-account" or "service-account-json" '
+ f"(more in docs: https://docs.getdbt.com/reference/warehouse-setups/bigquery-setup). "
+ f"Otherwise, you can provide a path to a json key file or a json key file data as an input."
+ )
+ dbt_creds["projectId"] = dbt_creds["project"]
+ elif db_type == "snowflake":
+ dbt_creds["default_db"] = dbt_creds["database"]
+ elif db_type == "databricks":
+ dbt_creds["http_password"] = dbt_creds["token"]
+ dbt_creds["database"] = dbt_creds.get("catalog")
+ return dbt_creds
+
+
+def _parse_ds_credentials(
+ ds_config: TCloudApiDataSourceConfigSchema, only_basic_settings: bool = True, dbt_parser: Optional[DbtParser] = None
+):
+ creds = {}
+ use_dbt_data = False
+ if dbt_parser is not None:
+ use_dbt_data = Confirm.ask("Would you like to extract database credentials from dbt profiles.yml?")
+ try:
+ creds = dbt_parser.get_connection_creds()[0]
+ creds = _align_dbt_cred_params_with_datafold_params(dbt_creds=creds)
+ except Exception as e:
+ rich.print(f"[red]Cannot parse database credentials from dbt profiles.yml. Reason: {e}")
+
+ ds_options = {}
+ basic_required_fields = set(ds_config.config_schema.required)
+ for param_name, param_data in ds_config.config_schema.properties.items():
+ if only_basic_settings and param_name not in basic_required_fields:
+ continue
+
+ default_value = param_data.get("default", UNKNOWN_VALUE)
+ is_password = bool(param_data.get("format"))
+
+ title = param_data["title"]
+ type_ = param_data["type"]
+ input_values = {
+ "prompt": title,
+ "password": is_password,
+ }
+ if default_value != UNKNOWN_VALUE:
+ input_values["default"] = default_value
+
+ if use_dbt_data:
+ value = creds.get(param_name, UNKNOWN_VALUE)
+ if value == UNKNOWN_VALUE:
+ rich.print(f'[red]Cannot extract "{param_name}" from dbt profiles.yml. Please, type it manually')
+ else:
+ ds_options[param_name] = _cast_value(value, type_)
+ continue
+
+ if type_ == "integer":
+ value = IntPrompt.ask(**input_values)
+ elif type_ == "boolean":
+ value = Confirm.ask(title)
+ else:
+ value = ValueRequiredPrompt.ask(**input_values)
+
+ ds_options[param_name] = value
+ return ds_options
+
+
+def _check_data_source_exists(
+ data_sources: List[TCloudApiDataSource],
+ data_source_name: str,
+) -> Optional[TCloudApiDataSource]:
+ for ds in data_sources:
+ if ds.name == data_source_name:
+ return ds
+ return None
+
+
+def _test_data_source(api: DatafoldAPI, data_source_id: int, timeout: int = 64) -> List[TDataSourceTestStage]:
+ job_id = api.test_data_source(data_source_id)
+
+ checked_tests = {"connection", "temp_schema", "schema_download"}
+ seconds = 1
+ start = time.monotonic()
+ results = []
+ while True:
+ tests = api.check_data_source_test_results(job_id)
+ for test in tests:
+ if test.name not in checked_tests:
+ continue
+
+ if test.status == "done":
+ checked_tests.remove(test.name)
+ results.append(
+ TDataSourceTestStage(name=test.name, status=test.result.status, description=test.result.message)
+ )
+
+ if not checked_tests:
+ break
+
+ if time.monotonic() - start > timeout:
+ for test_name in checked_tests:
+ results.append(
+ TDataSourceTestStage(
+ name=test_name,
+ status=TestDataSourceStatus.SKIP,
+ description=f"Does not complete in {timeout} seconds",
+ )
+ )
+ break
+ time.sleep(seconds)
+ seconds *= 2
+
+ return results
+
+
+def _render_data_source(data_source: TCloudApiDataSource, title: str = "") -> None:
+ table = Table(title=title, min_width=80)
+ table.add_column("Parameter", justify="center", style="cyan")
+ table.add_column("Value", justify="center", style="magenta")
+ table.add_row("ID", str(data_source.id))
+ table.add_row("Name", data_source.name)
+ table.add_row("Type", data_source.type)
+ rich.print(table)
+
+
+def _render_available_data_sources(data_source_schema_configs: List[TCloudApiDataSourceConfigSchema]) -> None:
+ config_names = [ds_config.name for ds_config in data_source_schema_configs]
+
+ table = Table()
+ table.add_column("", justify="center", style="cyan")
+ table.add_column("Available data sources", style="magenta")
+ for i, db_type in enumerate(config_names, start=1):
+ table.add_row(str(i), db_type)
+ rich.print(table)
+
+
+def _render_data_source_test_results(test_results: List[TDataSourceTestStage]) -> None:
+ table = Table(title="Test results", min_width=80)
+ table.add_column(
+ "Test",
+ justify="center",
+ style="cyan",
+ )
+ table.add_column("Status", justify="center", style="magenta")
+ table.add_column("Description", justify="center", style="magenta")
+ for result in test_results:
+ table.add_row(result.name, result.status, result.description)
+ rich.print(table)
+
+
+def get_or_create_data_source(api: DatafoldAPI, dbt_parser: Optional[DbtParser] = None) -> int:
+ ds_configs = api.get_data_source_schema_config()
+ data_sources = api.get_data_sources()
+
+ _render_available_data_sources(data_source_schema_configs=ds_configs)
+ db_type_num = IntPrompt.ask(
+ prompt="What data source type do you want to create? Please, select a number",
+ choices=list(map(str, range(1, len(ds_configs) + 1))),
+ show_choices=False,
+ )
+
+ ds_config = ds_configs[db_type_num - 1]
+ default_ds_name = ds_config.name
+ rich.print("Press enter to accept the (Default value)")
+ ds_name = Prompt.ask("Data source name", default=default_ds_name)
+
+ ds = _check_data_source_exists(data_sources=data_sources, data_source_name=ds_name)
+ if ds is not None:
+ _render_data_source(data_source=ds, title=f'Found existing data source for name "{ds.name}"')
+ use_existing_ds = Confirm.ask("Would you like to continue with the existing data source?")
+ if not use_existing_ds:
+ return get_or_create_data_source(api=api, dbt_parser=dbt_parser)
+ return ds.id
+
+ ds_config = create_ds_config(ds_config=ds_config, data_source_name=ds_name, dbt_parser=dbt_parser)
+ ds = api.create_data_source(ds_config)
+ data_source_url = f"{api.host}/settings/integrations/dwh/{ds.type}/{ds.id}"
+ _render_data_source(data_source=ds, title=f"Created a new data source with ID = {ds.id} ({data_source_url})")
+
+ rich.print(
+ "We recommend to run tests for a new data source. "
+ "It requires some time but makes sure that the data source is configured correctly."
+ )
+ run_tests = Confirm.ask("Would you like to run tests?")
+ if run_tests:
+ test_results = _test_data_source(api=api, data_source_id=ds.id)
+ _render_data_source_test_results(test_results=test_results)
+ if any(result.status == TestDataSourceStatus.FAILED for result in test_results):
+ raise ValueError(
+ f"Data source tests failed. Please, try to update or test data source in the UI: {data_source_url}"
+ )
+
+ return ds.id
diff --git a/data_diff/cloud/datafold_api.py b/data_diff/cloud/datafold_api.py
new file mode 100644
index 00000000..84d813f7
--- /dev/null
+++ b/data_diff/cloud/datafold_api.py
@@ -0,0 +1,269 @@
+import base64
+import dataclasses
+import enum
+import time
+from typing import Any, Dict, List, Optional, Type, TypeVar, Tuple
+
+import pydantic
+import requests
+
+from ..utils import getLogger
+
+logger = getLogger(__name__)
+
+Self = TypeVar("Self", bound=pydantic.BaseModel)
+
+
+class TestDataSourceStatus(str, enum.Enum):
+ SUCCESS = "ok"
+ FAILED = "error"
+ SKIP = "skip"
+ UNKNOWN = "unknown"
+
+
+class TCloudApiDataSourceSchema(pydantic.BaseModel):
+ title: str
+ properties: Dict[str, Dict[str, Any]]
+ required: List[str]
+ secret: List[str]
+
+ @classmethod
+ def from_orm(cls: Type[Self], obj: Any) -> Self:
+ data_source_types_required_parameters = {
+ "bigquery": ["projectId", "jsonKeyFile", "location"],
+ "databricks": ["host", "http_password", "database", "http_path"],
+ "mysql": ["host", "user", "passwd", "db"],
+ "pg": ["host", "user", "port", "password", "dbname"],
+ "postgres_aurora": ["host", "user", "port", "password", "dbname"],
+ "postgres_aws_rds": ["host", "user", "port", "password", "dbname"],
+ "redshift": ["host", "user", "port", "password", "dbname"],
+ "snowflake": ["account", "user", "password", "warehouse", "role", "default_db"],
+ }
+
+ return cls(
+ title=obj["configuration_schema"]["title"],
+ properties=obj["configuration_schema"]["properties"],
+ required=data_source_types_required_parameters[obj["type"]],
+ secret=obj["configuration_schema"]["secret"],
+ )
+
+
+class TCloudApiDataSourceConfigSchema(pydantic.BaseModel):
+ name: str
+ db_type: str
+ config_schema: TCloudApiDataSourceSchema
+
+
+class TCloudApiDataSource(pydantic.BaseModel):
+ id: Optional[int] = None
+ name: str
+ type: str
+ is_paused: Optional[bool] = False
+ hidden: Optional[bool] = False
+ temp_schema: Optional[str] = None
+ disable_schema_indexing: Optional[bool] = False
+ disable_profiling: Optional[bool] = False
+ catalog_include_list: Optional[str] = None
+ catalog_exclude_list: Optional[str] = None
+ schema_indexing_schedule: Optional[str] = None
+ schema_max_age_s: Optional[int] = None
+ profile_schedule: Optional[str] = None
+ profile_exclude_list: Optional[str] = None
+ profile_include_list: Optional[str] = None
+ discourage_manual_profiling: Optional[bool] = False
+ lineage_schedule: Optional[str] = None
+ float_tolerance: Optional[float] = 0.0
+ options: Optional[Dict[str, Any]] = None
+ queue_name: Optional[str] = None
+ scheduled_queue_name: Optional[str] = None
+ groups: Optional[Dict[int, bool]] = None
+ view_only: Optional[bool] = False
+ created_from: Optional[str] = None
+ source: Optional[str] = None
+ max_allowed_connections: Optional[int] = None
+ last_test: Optional[Any] = None
+ secret_id: Optional[int] = None
+
+
+class TDsConfig(pydantic.BaseModel):
+ name: str
+ type: str
+ temp_schema: str
+ float_tolerance: float = 0.0
+ options: Dict[str, Any]
+ disable_schema_indexing: bool = True
+ disable_profiling: bool = True
+
+
+class TCloudApiDataDiff(pydantic.BaseModel):
+ data_source1_id: int
+ data_source2_id: int
+ table1: List[str]
+ table2: List[str]
+ pk_columns: List[str]
+ filter1: Optional[str] = None
+ filter2: Optional[str] = None
+
+
+class TSummaryResultPrimaryKeyStats(pydantic.BaseModel):
+ total_rows: Tuple[int, int]
+ nulls: Tuple[int, int]
+ dupes: Tuple[int, int]
+ exclusives: Tuple[int, int]
+ distincts: Tuple[int, int]
+
+
+class TSummaryResultColumnDiffStats(pydantic.BaseModel):
+ column_name: str
+ match: float
+
+
+class TSummaryResultValueStats(pydantic.BaseModel):
+ total_rows: int
+ rows_with_differences: int
+ total_values: int
+ compared_columns: int
+ columns_with_differences: int
+ columns_diff_stats: List[TSummaryResultColumnDiffStats]
+
+
+class TSummaryResultSchemaStats(pydantic.BaseModel):
+ columns_mismatched: Tuple[int, int]
+ column_type_mismatches: int
+ column_reorders: int
+ column_counts: Tuple[int, int]
+
+
+class TCloudApiDataDiffSummaryResult(pydantic.BaseModel):
+ status: str
+ pks: Optional[TSummaryResultPrimaryKeyStats]
+ values: Optional[TSummaryResultValueStats]
+ schema_: Optional[TSummaryResultSchemaStats]
+ dependencies: Optional[Dict[str, Any]]
+
+ @classmethod
+ def from_orm(cls: Type[Self], obj: Any) -> Self:
+ pks = TSummaryResultPrimaryKeyStats(**obj["pks"]) if "pks" in obj else None
+ values = TSummaryResultValueStats(**obj["values"]) if "values" in obj else None
+ deps = obj["deps"] if "deps" in obj else None
+ schema = TSummaryResultSchemaStats(**obj["schema"]) if "schema" in obj else None
+ return cls(
+ status=obj["status"],
+ pks=pks,
+ values=values,
+ schema_=schema,
+ deps=deps,
+ )
+
+
+class TCloudDataSourceTestResult(pydantic.BaseModel):
+ status: TestDataSourceStatus
+ message: str
+ outcome: str
+
+
+class TCloudApiDataSourceTestResult(pydantic.BaseModel):
+ name: str
+ status: str
+ result: Optional[TCloudDataSourceTestResult]
+
+
+@dataclasses.dataclass
+class DatafoldAPI:
+ api_key: str
+ host: str = "https://app.datafold.com"
+ timeout: int = 30
+
+ def __post_init__(self):
+ self.host = self.host.rstrip("/")
+ self.headers = {
+ "Authorization": f"Key {self.api_key}",
+ "Content-Type": "application/json",
+ }
+
+ def make_get_request(self, url: str) -> Any:
+ rv = requests.get(url=f"{self.host}/{url}", headers=self.headers, timeout=self.timeout)
+ rv.raise_for_status()
+ return rv
+
+ def make_post_request(self, url: str, payload: Any) -> Any:
+ rv = requests.post(url=f"{self.host}/{url}", headers=self.headers, json=payload, timeout=self.timeout)
+ rv.raise_for_status()
+ return rv
+
+ def get_data_sources(self) -> List[TCloudApiDataSource]:
+ rv = self.make_get_request(url="api/v1/data_sources")
+ rv.raise_for_status()
+ return [TCloudApiDataSource(**item) for item in rv.json()]
+
+ def create_data_source(self, config: TDsConfig) -> TCloudApiDataSource:
+ payload = config.dict()
+ if config.type == "bigquery":
+ json_string = payload["options"]["jsonKeyFile"].encode("utf-8")
+ payload["options"]["jsonKeyFile"] = base64.b64encode(json_string).decode("utf-8")
+ rv = self.make_post_request(url="api/v1/data_sources", payload=payload)
+ return TCloudApiDataSource(**rv.json())
+
+ def get_data_source_schema_config(
+ self,
+ only_important_properties: bool = False,
+ ) -> List[TCloudApiDataSourceConfigSchema]:
+ rv = self.make_get_request(url="api/v1/data_sources/types")
+ return [
+ TCloudApiDataSourceConfigSchema(
+ name=item["name"],
+ db_type=item["type"],
+ config_schema=TCloudApiDataSourceSchema.from_orm(obj=item),
+ )
+ for item in rv.json()
+ ]
+
+ def create_data_diff(self, payload: TCloudApiDataDiff) -> int:
+ rv = self.make_post_request(url="api/v1/datadiffs", payload=payload.dict())
+ return rv.json()["id"]
+
+ def poll_data_diff_results(self, diff_id: int) -> TCloudApiDataDiffSummaryResult:
+ summary_results = None
+ start_time = time.monotonic()
+ sleep_interval = 5 # starts at 5 sec
+ max_sleep_interval = 30
+ max_wait_time = 300
+
+ diff_url = f"{self.host}/datadiffs/{diff_id}/overview"
+ while not summary_results:
+ logger.debug(f"Polling: {diff_url}")
+ response = self.make_get_request(url=f"api/v1/datadiffs/{diff_id}/summary_results")
+ response_json = response.json()
+ if response_json["status"] == "success":
+ summary_results = response_json
+ elif response_json["status"] == "failed":
+ raise Exception(f"Diff failed: {str(response_json)}")
+
+ if time.monotonic() - start_time > max_wait_time:
+ raise Exception(f"Timed out waiting for diff results. Please, go to the UI for details: {diff_url}")
+
+ time.sleep(sleep_interval)
+ sleep_interval = min(sleep_interval * 2, max_sleep_interval)
+
+ return TCloudApiDataDiffSummaryResult.from_orm(summary_results)
+
+ def test_data_source(self, data_source_id: int) -> int:
+ rv = self.make_post_request(f"api/v1/data_sources/{data_source_id}/test", {})
+ return rv.json()["job_id"]
+
+ def check_data_source_test_results(self, job_id: int) -> List[TCloudApiDataSourceTestResult]:
+ rv = self.make_get_request(f"api/v1/data_sources/test/{job_id}")
+ return [
+ TCloudApiDataSourceTestResult(
+ name=item["step"],
+ status=item["status"],
+ result=TCloudDataSourceTestResult(
+ status=item["result"]["code"].lower(),
+ message=item["result"]["message"],
+ outcome=item["result"]["outcome"],
+ )
+ if item["result"] is not None
+ else None,
+ )
+ for item in rv.json()["results"]
+ ]
diff --git a/data_diff/databases/__init__.py b/data_diff/databases/__init__.py
index 7ae94f92..9b9a81ea 100644
--- a/data_diff/databases/__init__.py
+++ b/data_diff/databases/__init__.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import MD5_HEXDIGITS, CHECKSUM_HEXDIGITS, QueryError, ConnectError
+from data_diff.sqeleton.databases import MD5_HEXDIGITS, CHECKSUM_HEXDIGITS, QueryError, ConnectError
from .postgresql import PostgreSQL
from .mysql import MySQL
diff --git a/data_diff/databases/_connect.py b/data_diff/databases/_connect.py
index abad1580..6ca94246 100644
--- a/data_diff/databases/_connect.py
+++ b/data_diff/databases/_connect.py
@@ -1,6 +1,6 @@
import logging
-from sqeleton.databases import Connect
+from data_diff.sqeleton.databases import Connect
from .postgresql import PostgreSQL
from .mysql import MySQL
diff --git a/data_diff/databases/base.py b/data_diff/databases/base.py
index 96007ebf..5b7ff5ce 100644
--- a/data_diff/databases/base.py
+++ b/data_diff/databases/base.py
@@ -1,4 +1,4 @@
-from sqeleton.abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue
+from data_diff.sqeleton.abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue
class DatadiffDialect(AbstractMixin_MD5, AbstractMixin_NormalizeValue):
diff --git a/data_diff/databases/bigquery.py b/data_diff/databases/bigquery.py
index ccf68f19..3fe611bd 100644
--- a/data_diff/databases/bigquery.py
+++ b/data_diff/databases/bigquery.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import bigquery
+from data_diff.sqeleton.databases import bigquery
from .base import DatadiffDialect
diff --git a/data_diff/databases/clickhouse.py b/data_diff/databases/clickhouse.py
index 1a9bf5ee..feb1b884 100644
--- a/data_diff/databases/clickhouse.py
+++ b/data_diff/databases/clickhouse.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import clickhouse
+from data_diff.sqeleton.databases import clickhouse
from .base import DatadiffDialect
diff --git a/data_diff/databases/databricks.py b/data_diff/databases/databricks.py
index 5191a93d..9fa83307 100644
--- a/data_diff/databases/databricks.py
+++ b/data_diff/databases/databricks.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import databricks
+from data_diff.sqeleton.databases import databricks
from .base import DatadiffDialect
diff --git a/data_diff/databases/duckdb.py b/data_diff/databases/duckdb.py
index 762a482f..60799aa1 100644
--- a/data_diff/databases/duckdb.py
+++ b/data_diff/databases/duckdb.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import duckdb
+from data_diff.sqeleton.databases import duckdb
from .base import DatadiffDialect
diff --git a/data_diff/databases/mysql.py b/data_diff/databases/mysql.py
index de3f051e..05ebf1a7 100644
--- a/data_diff/databases/mysql.py
+++ b/data_diff/databases/mysql.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import mysql
+from data_diff.sqeleton.databases import mysql
from .base import DatadiffDialect
diff --git a/data_diff/databases/oracle.py b/data_diff/databases/oracle.py
index 223b61f9..db819cc3 100644
--- a/data_diff/databases/oracle.py
+++ b/data_diff/databases/oracle.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import oracle
+from data_diff.sqeleton.databases import oracle
from .base import DatadiffDialect
diff --git a/data_diff/databases/postgresql.py b/data_diff/databases/postgresql.py
index f4828eb0..75613e8b 100644
--- a/data_diff/databases/postgresql.py
+++ b/data_diff/databases/postgresql.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import postgresql as pg
+from data_diff.sqeleton.databases import postgresql as pg
from .base import DatadiffDialect
diff --git a/data_diff/databases/presto.py b/data_diff/databases/presto.py
index 3de970fd..2c95ffbe 100644
--- a/data_diff/databases/presto.py
+++ b/data_diff/databases/presto.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import presto
+from data_diff.sqeleton.databases import presto
from .base import DatadiffDialect
diff --git a/data_diff/databases/redshift.py b/data_diff/databases/redshift.py
index 51f05305..6928ade2 100644
--- a/data_diff/databases/redshift.py
+++ b/data_diff/databases/redshift.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import redshift
+from data_diff.sqeleton.databases import redshift
from .base import DatadiffDialect
diff --git a/data_diff/databases/snowflake.py b/data_diff/databases/snowflake.py
index e32c47f9..84487f15 100644
--- a/data_diff/databases/snowflake.py
+++ b/data_diff/databases/snowflake.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import snowflake
+from data_diff.sqeleton.databases import snowflake
from .base import DatadiffDialect
diff --git a/data_diff/databases/trino.py b/data_diff/databases/trino.py
index e4d88e12..5f686088 100644
--- a/data_diff/databases/trino.py
+++ b/data_diff/databases/trino.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import trino
+from data_diff.sqeleton.databases import trino
from .base import DatadiffDialect
diff --git a/data_diff/databases/vertica.py b/data_diff/databases/vertica.py
index eb891c3c..19ccd7d9 100644
--- a/data_diff/databases/vertica.py
+++ b/data_diff/databases/vertica.py
@@ -1,4 +1,4 @@
-from sqeleton.databases import vertica
+from data_diff.sqeleton.databases import vertica
from .base import DatadiffDialect
diff --git a/data_diff/dbt.py b/data_diff/dbt.py
index 0baec9d5..ec139b27 100644
--- a/data_diff/dbt.py
+++ b/data_diff/dbt.py
@@ -1,87 +1,96 @@
-import json
-import logging
import os
import time
+import webbrowser
import rich
+from rich.prompt import Confirm
+
from dataclasses import dataclass
-from packaging.version import parse as parse_version
-from typing import List, Optional, Dict, Tuple
+from typing import List, Optional, Dict
+from .utils import dbt_diff_string_template, getLogger
from pathlib import Path
-import requests
+import keyring
+from .cloud import DatafoldAPI, TCloudApiDataDiff, get_or_create_data_source
+from .dbt_parser import DbtParser, PROJECT_FILE
-def import_dbt():
- try:
- from dbt_artifacts_parser.parser import parse_run_results, parse_manifest
- from dbt.config.renderer import ProfileRenderer
- import yaml
- except ImportError:
- raise RuntimeError("Could not import 'dbt' package. You can install it using: pip install 'data-diff[dbt]'.")
- return parse_run_results, parse_manifest, ProfileRenderer, yaml
+logger = getLogger(__name__)
from .tracking import (
set_entrypoint_name,
+ set_dbt_user_id,
+ set_dbt_version,
+ set_dbt_project_id,
create_end_event_json,
create_start_event_json,
send_event_json,
is_tracking_enabled,
)
-from .utils import get_from_dict_with_raise, run_as_daemon, truncate_error
+from .utils import run_as_daemon, truncate_error
from . import connect_to_table, diff_tables, Algorithm
-RUN_RESULTS_PATH = "target/run_results.json"
-MANIFEST_PATH = "target/manifest.json"
-PROJECT_FILE = "dbt_project.yml"
-PROFILES_FILE = "profiles.yml"
-LOWER_DBT_V = "1.0.0"
-UPPER_DBT_V = "1.4.5"
-
-
-# https://github.com/dbt-labs/dbt-core/blob/c952d44ec5c2506995fbad75320acbae49125d3d/core/dbt/cli/resolvers.py#L6
-def default_project_dir() -> Path:
- paths = list(Path.cwd().parents)
- paths.insert(0, Path.cwd())
- return next((x for x in paths if (x / PROJECT_FILE).exists()), Path.cwd())
-
-
-# https://github.com/dbt-labs/dbt-core/blob/c952d44ec5c2506995fbad75320acbae49125d3d/core/dbt/cli/resolvers.py#L12
-def default_profiles_dir() -> Path:
- return Path.cwd() if (Path.cwd() / PROFILES_FILE).exists() else Path.home() / ".dbt"
-
-
-def legacy_profiles_dir() -> Path:
- return Path.home() / ".dbt"
-
@dataclass
class DiffVars:
dev_path: List[str]
prod_path: List[str]
primary_keys: List[str]
- datasource_id: str
connection: Dict[str, str]
threads: Optional[int]
+ where_filter: Optional[str] = None
def dbt_diff(
- profiles_dir_override: Optional[str] = None, project_dir_override: Optional[str] = None, is_cloud: bool = False
+ profiles_dir_override: Optional[str] = None,
+ project_dir_override: Optional[str] = None,
+ is_cloud: bool = False,
+ dbt_selection: Optional[str] = None,
) -> None:
+ diff_threads = []
set_entrypoint_name("CLI-dbt")
- dbt_parser = DbtParser(profiles_dir_override, project_dir_override, is_cloud)
- models = dbt_parser.get_models()
- dbt_parser.set_project_dict()
+ dbt_parser = DbtParser(profiles_dir_override, project_dir_override)
+ models = dbt_parser.get_models(dbt_selection)
datadiff_variables = dbt_parser.get_datadiff_variables()
config_prod_database = datadiff_variables.get("prod_database")
config_prod_schema = datadiff_variables.get("prod_schema")
+ config_prod_custom_schema = datadiff_variables.get("prod_custom_schema")
datasource_id = datadiff_variables.get("datasource_id")
- custom_schemas = datadiff_variables.get("custom_schemas")
- # custom schemas is default dbt behavior, so default to True if the var doesn't exist
- custom_schemas = True if custom_schemas is None else custom_schemas
+ set_dbt_user_id(dbt_parser.dbt_user_id)
+ set_dbt_version(dbt_parser.dbt_version)
+ set_dbt_project_id(dbt_parser.dbt_project_id)
+
+ if datadiff_variables.get("custom_schemas") is not None:
+ logger.warning(
+ "vars: data_diff: custom_schemas: is no longer used and can be removed.\nTo utilize custom schemas, see the documentation here: https://docs.datafold.com/development_testing/open_source"
+ )
- if not is_cloud:
+ if is_cloud:
+ api = _initialize_api()
+ # exit so the user can set the key
+ if not api:
+ return
+
+ if datasource_id is None:
+ rich.print("[red]Data source ID not found in dbt_project.yml")
+ is_create_data_source = Confirm.ask("Would you like to create a new data source?")
+ if is_create_data_source:
+ datasource_id = get_or_create_data_source(api=api, dbt_parser=dbt_parser)
+ rich.print(f'To use the data source in next runs, please, update your "{PROJECT_FILE}" with a block:')
+ rich.print(f"[green]vars:\n data_diff:\n datasource_id: {datasource_id}\n")
+ rich.print(
+ "Read more about Datafold vars in docs: "
+ "https://docs.datafold.com/os_diff/dbt_integration/#configure-a-data-source\n"
+ )
+ else:
+ raise ValueError(
+ "Datasource ID not found, include it as a dbt variable in the dbt_project.yml. "
+ "\nvars:\n data_diff:\n datasource_id: 1234"
+ )
+ rich.print("[green][bold]\nDiffs in progress...[/][/]\n")
+
+ else:
dbt_parser.set_connection()
if config_prod_database is None:
@@ -91,46 +100,58 @@ def dbt_diff(
for model in models:
diff_vars = _get_diff_vars(
- dbt_parser, config_prod_database, config_prod_schema, model, datasource_id, custom_schemas
+ dbt_parser, config_prod_database, config_prod_schema, config_prod_custom_schema, model
)
- if is_cloud and len(diff_vars.primary_keys) > 0:
- _cloud_diff(diff_vars)
- elif not is_cloud and len(diff_vars.primary_keys) > 0:
- _local_diff(diff_vars)
+ if diff_vars.primary_keys:
+ if is_cloud:
+ diff_thread = run_as_daemon(_cloud_diff, diff_vars, datasource_id, api)
+ diff_threads.append(diff_thread)
+ else:
+ _local_diff(diff_vars)
else:
rich.print(
- "[red]"
- + ".".join(diff_vars.prod_path)
- + " <> "
- + ".".join(diff_vars.dev_path)
- + "[/] \n"
- + "Skipped due to missing primary-key tag(s).\n"
+ _diff_output_base(".".join(diff_vars.dev_path), ".".join(diff_vars.prod_path))
+ + "Skipped due to unknown primary key. Add uniqueness tests, meta, or tags.\n"
)
- rich.print("Diffs Complete!")
+ # wait for all threads
+ if diff_threads:
+ for thread in diff_threads:
+ thread.join()
def _get_diff_vars(
dbt_parser: "DbtParser",
config_prod_database: Optional[str],
config_prod_schema: Optional[str],
+ config_prod_custom_schema: Optional[str],
model,
- datasource_id: int,
- custom_schemas: bool,
) -> DiffVars:
dev_database = model.database
dev_schema = model.schema_
- primary_keys = dbt_parser.get_primary_keys(model)
+
+ primary_keys = dbt_parser.get_pk_from_model(model, dbt_parser.unique_columns, "primary-key")
prod_database = config_prod_database if config_prod_database else dev_database
- prod_schema = config_prod_schema if config_prod_schema else dev_schema
- # if project has custom schemas (default)
- # need to construct the prod schema as _
- # https://docs.getdbt.com/docs/build/custom-schemas
- if custom_schemas and model.config.schema_:
- prod_schema = prod_schema + "_" + model.config.schema_
+ # prod schema name differs from dev schema name
+ if config_prod_schema:
+ custom_schema = model.config.schema_
+
+ # the model has a custom schema config(schema='some_schema')
+ if custom_schema:
+ if not config_prod_custom_schema:
+ raise ValueError(
+ f"Found a custom schema on model {model.name}, but no value for\nvars:\n data_diff:\n prod_custom_schema:\nPlease set a value!\n"
+ + "For more details see: https://docs.datafold.com/development_testing/open_source"
+ )
+ prod_schema = config_prod_custom_schema.replace("", custom_schema)
+ # no custom schema, use the default
+ else:
+ prod_schema = config_prod_schema
+ else:
+ prod_schema = dev_schema
if dbt_parser.requires_upper:
dev_qualified_list = [x.upper() for x in [dev_database, dev_schema, model.alias]]
@@ -140,21 +161,27 @@ def _get_diff_vars(
dev_qualified_list = [dev_database, dev_schema, model.alias]
prod_qualified_list = [prod_database, prod_schema, model.alias]
+ where_filter = None
+ if model.meta:
+ try:
+ where_filter = model.meta["datafold"]["datadiff"]["filter"]
+ except KeyError:
+ pass
+
return DiffVars(
- dev_qualified_list, prod_qualified_list, primary_keys, datasource_id, dbt_parser.connection, dbt_parser.threads
+ dev_qualified_list, prod_qualified_list, primary_keys, dbt_parser.connection, dbt_parser.threads, where_filter
)
def _local_diff(diff_vars: DiffVars) -> None:
column_diffs_str = ""
- dev_qualified_string = ".".join(diff_vars.dev_path)
- prod_qualified_string = ".".join(diff_vars.prod_path)
+ dev_qualified_str = ".".join(diff_vars.dev_path)
+ prod_qualified_str = ".".join(diff_vars.prod_path)
+ diff_output_str = _diff_output_base(dev_qualified_str, prod_qualified_str)
- table1 = connect_to_table(
- diff_vars.connection, dev_qualified_string, tuple(diff_vars.primary_keys), diff_vars.threads
- )
+ table1 = connect_to_table(diff_vars.connection, dev_qualified_str, tuple(diff_vars.primary_keys), diff_vars.threads)
table2 = connect_to_table(
- diff_vars.connection, prod_qualified_string, tuple(diff_vars.primary_keys), diff_vars.threads
+ diff_vars.connection, prod_qualified_str, tuple(diff_vars.primary_keys), diff_vars.threads
)
table1_columns = list(table1.get_schema())
@@ -162,16 +189,9 @@ def _local_diff(diff_vars: DiffVars) -> None:
table2_columns = list(table2.get_schema())
# Not ideal, but we don't have more specific exceptions yet
except Exception as ex:
- logging.info(ex)
- rich.print(
- "[red]"
- + prod_qualified_string
- + " <> "
- + dev_qualified_string
- + "[/] \n"
- + column_diffs_str
- + "[green]New model or no access to prod table.[/] \n"
- )
+ logger.debug(ex)
+ diff_output_str += "[red]New model or no access to prod table.[/] \n"
+ rich.print(diff_output_str)
return
mutual_set = set(table1_columns) & set(table2_columns)
@@ -187,78 +207,109 @@ def _local_diff(diff_vars: DiffVars) -> None:
mutual_set = mutual_set - set(diff_vars.primary_keys)
extra_columns = tuple(mutual_set)
- diff = diff_tables(table1, table2, threaded=True, algorithm=Algorithm.JOINDIFF, extra_columns=extra_columns)
+ diff = diff_tables(
+ table1,
+ table2,
+ threaded=True,
+ algorithm=Algorithm.JOINDIFF,
+ extra_columns=extra_columns,
+ where=diff_vars.where_filter,
+ )
if list(diff):
- rich.print(
- "[red]"
- + prod_qualified_string
- + " <> "
- + dev_qualified_string
- + "[/] \n"
- + column_diffs_str
- + diff.get_stats_string(is_dbt=True)
- + "\n"
- )
+ diff_output_str += f"{column_diffs_str}{diff.get_stats_string(is_dbt=True)} \n"
+ rich.print(diff_output_str)
else:
- rich.print(
- "[red]"
- + prod_qualified_string
- + " <> "
- + dev_qualified_string
- + "[/] \n"
- + column_diffs_str
- + "[green]No row differences[/] \n"
- )
+ diff_output_str += f"{column_diffs_str}[bold][green]No row differences[/][/] \n"
+ rich.print(diff_output_str)
+
+def _initialize_api() -> Optional[DatafoldAPI]:
+ datafold_host = os.environ.get("DATAFOLD_HOST")
+ if datafold_host is None:
+ datafold_host = "https://app.datafold.com"
+ datafold_host = datafold_host.rstrip("/")
+ rich.print(f"Cloud datafold host: {datafold_host}")
-def _cloud_diff(diff_vars: DiffVars) -> None:
api_key = os.environ.get("DATAFOLD_API_KEY")
+ if not api_key:
+ rich.print("[red]API key not found. Getting from the keyring service")
+ api_key = keyring.get_password("data-diff", "DATAFOLD_API_KEY")
+ if not api_key:
+ rich.print("[red]API key not found, add it as an environment variable called DATAFOLD_API_KEY.")
+
+ yes_or_no = Confirm.ask("Would you like to generate a new API key?")
+ if yes_or_no:
+ webbrowser.open(f"{datafold_host}/login?next={datafold_host}/users/me")
+ rich.print('After generating, please, perform in the terminal "export DATAFOLD_API_KEY="')
+ return None
+ else:
+ raise ValueError("Cannot initialize API because the API key is not provided")
+
+ rich.print("Saving the API key to the system keyring service")
+ try:
+ keyring.set_password("data-diff", "DATAFOLD_API_KEY", api_key)
+ except Exception as e:
+ rich.print(f"[red]Failed when saving the API key to the system keyring service. Reason: {e}")
+
+ return DatafoldAPI(api_key=api_key, host=datafold_host)
+
+
+def _cloud_diff(diff_vars: DiffVars, datasource_id: int, api: DatafoldAPI) -> None:
+ diff_output_str = _diff_output_base(".".join(diff_vars.dev_path), ".".join(diff_vars.prod_path))
+ payload = TCloudApiDataDiff(
+ data_source1_id=datasource_id,
+ data_source2_id=datasource_id,
+ table1=diff_vars.prod_path,
+ table2=diff_vars.dev_path,
+ pk_columns=diff_vars.primary_keys,
+ filter1=diff_vars.where_filter,
+ filter2=diff_vars.where_filter,
+ )
- if diff_vars.datasource_id is None:
- raise ValueError(
- "Datasource ID not found, include it as a dbt variable in the dbt_project.yml. \nvars:\n data_diff:\n datasource_id: 1234"
- )
- if api_key is None:
- raise ValueError("API key not found, add it as an environment variable called DATAFOLD_API_KEY.")
-
- url = "https://app.datafold.com/api/v1/datadiffs"
-
- payload = {
- "data_source1_id": diff_vars.datasource_id,
- "data_source2_id": diff_vars.datasource_id,
- "table1": diff_vars.prod_path,
- "table2": diff_vars.dev_path,
- "pk_columns": diff_vars.primary_keys,
- }
-
- headers = {
- "Authorization": f"Key {api_key}",
- "Content-Type": "application/json",
- }
if is_tracking_enabled():
- event_json = create_start_event_json({"is_cloud": True, "datasource_id": diff_vars.datasource_id})
+ event_json = create_start_event_json({"is_cloud": True, "datasource_id": datasource_id})
run_as_daemon(send_event_json, event_json)
start = time.monotonic()
error = None
diff_id = None
+ diff_url = None
try:
- response = requests.request("POST", url, headers=headers, json=payload, timeout=30)
- response.raise_for_status()
- data = response.json()
- diff_id = data["id"]
- # TODO in future we should support self hosted datafold
- diff_url = f"https://app.datafold.com/datadiffs/{diff_id}/overview"
- rich.print(
- "[red]"
- + ".".join(diff_vars.prod_path)
- + " <> "
- + ".".join(diff_vars.dev_path)
- + "[/] \n Diff in progress: \n "
- + diff_url
- + "\n"
- )
+ diff_id = api.create_data_diff(payload=payload)
+ diff_url = f"{api.host}/datadiffs/{diff_id}/overview"
+ rich.print(f"{diff_vars.dev_path[2]}: {diff_url}")
+
+ if diff_id is None:
+ raise Exception(f"Api response did not contain a diff_id")
+
+ diff_results = api.poll_data_diff_results(diff_id)
+
+ rows_added_count = diff_results.pks.exclusives[1]
+ rows_removed_count = diff_results.pks.exclusives[0]
+
+ rows_updated = diff_results.values.rows_with_differences
+ total_rows = diff_results.values.total_rows
+ rows_unchanged = int(total_rows) - int(rows_updated)
+ diff_percent_list = {
+ x.column_name: str(x.match) + "%" for x in diff_results.values.columns_diff_stats if x.match != 100.0
+ }
+
+ if any([rows_added_count, rows_removed_count, rows_updated]):
+ diff_output = dbt_diff_string_template(
+ rows_added_count,
+ rows_removed_count,
+ rows_updated,
+ str(rows_unchanged),
+ diff_percent_list,
+ "Value Match Percent:",
+ )
+ diff_output_str += f"{diff_url}\n {diff_output} \n"
+ rich.print(diff_output_str)
+ else:
+ diff_output_str += f"{diff_url}\n [green]No row differences[/] \n"
+ rich.print(diff_output_str)
+
except BaseException as ex: # Catch KeyboardInterrupt too
error = ex
finally:
@@ -282,158 +333,12 @@ def _cloud_diff(diff_vars: DiffVars) -> None:
send_event_json(event_json)
if error:
- raise error
-
-
-class DbtParser:
- def __init__(self, profiles_dir_override: str, project_dir_override: str, is_cloud: bool) -> None:
- self.profiles_dir = Path(profiles_dir_override or default_profiles_dir())
- self.project_dir = Path(project_dir_override or default_project_dir())
- self.is_cloud = is_cloud
- self.connection = None
- self.project_dict = None
- self.requires_upper = False
- self.threads = None
-
- self.parse_run_results, self.parse_manifest, self.ProfileRenderer, self.yaml = import_dbt()
-
- def get_datadiff_variables(self) -> dict:
- return self.project_dict.get("vars").get("data_diff")
-
- def get_models(self):
- with open(self.project_dir / RUN_RESULTS_PATH) as run_results:
- run_results_dict = json.load(run_results)
- run_results_obj = self.parse_run_results(run_results=run_results_dict)
-
- dbt_version = parse_version(run_results_obj.metadata.dbt_version)
-
- if dbt_version < parse_version("1.3.0"):
- self.profiles_dir = legacy_profiles_dir()
-
- if dbt_version < parse_version(LOWER_DBT_V) or dbt_version >= parse_version(UPPER_DBT_V):
- raise Exception(
- f"Found dbt: v{dbt_version} Expected the dbt project's version to be >= {LOWER_DBT_V} and < {UPPER_DBT_V}"
- )
+ rich.print(diff_output_str)
+ if diff_id:
+ diff_url = f"{api.host}/datadiffs/{diff_id}/overview"
+ rich.print(f"{diff_url} \n")
+ logger.error(error)
- with open(self.project_dir / MANIFEST_PATH) as manifest:
- manifest_dict = json.load(manifest)
- manifest_obj = self.parse_manifest(manifest=manifest_dict)
-
- success_models = [x.unique_id for x in run_results_obj.results if x.status.name == "success"]
- models = [manifest_obj.nodes.get(x) for x in success_models]
- if not models:
- raise ValueError("Expected > 0 successful models runs from the last dbt command.")
-
- rich.print(f"Found {str(len(models))} successful model runs from the last dbt command.")
- return models
-
- def get_primary_keys(self, model):
- return list((x.name for x in model.columns.values() if "primary-key" in x.tags))
-
- def set_project_dict(self):
- with open(self.project_dir / PROJECT_FILE) as project:
- self.project_dict = self.yaml.safe_load(project)
-
- def _get_connection_creds(self) -> Tuple[Dict[str, str], str]:
- profiles_path = self.profiles_dir / PROFILES_FILE
- with open(profiles_path) as profiles:
- profiles = self.yaml.safe_load(profiles)
-
- dbt_profile_var = self.project_dict.get("profile")
-
- profile = get_from_dict_with_raise(
- profiles, dbt_profile_var, f"No profile '{dbt_profile_var}' found in '{profiles_path}'."
- )
- # values can contain env_vars
- rendered_profile = self.ProfileRenderer().render_data(profile)
- profile_target = get_from_dict_with_raise(
- rendered_profile, "target", f"No target found in profile '{dbt_profile_var}' in '{profiles_path}'."
- )
- outputs = get_from_dict_with_raise(
- rendered_profile, "outputs", f"No outputs found in profile '{dbt_profile_var}' in '{profiles_path}'."
- )
- credentials = get_from_dict_with_raise(
- outputs,
- profile_target,
- f"No credentials found for target '{profile_target}' in profile '{dbt_profile_var}' in '{profiles_path}'.",
- )
- conn_type = get_from_dict_with_raise(
- credentials,
- "type",
- f"No type found for target '{profile_target}' in profile '{dbt_profile_var}' in '{profiles_path}'.",
- )
- conn_type = conn_type.lower()
-
- return credentials, conn_type
-
- def set_connection(self):
- credentials, conn_type = self._get_connection_creds()
-
- if conn_type == "snowflake":
- if credentials.get("password") is None or credentials.get("private_key_path") is not None:
- raise Exception("Only password authentication is currently supported for Snowflake.")
- conn_info = {
- "driver": conn_type,
- "user": credentials.get("user"),
- "password": credentials.get("password"),
- "account": credentials.get("account"),
- "database": credentials.get("database"),
- "warehouse": credentials.get("warehouse"),
- "role": credentials.get("role"),
- "schema": credentials.get("schema"),
- }
- self.threads = credentials.get("threads")
- self.requires_upper = True
- elif conn_type == "bigquery":
- method = credentials.get("method")
- # there are many connection types https://docs.getdbt.com/reference/warehouse-setups/bigquery-setup#oauth-via-gcloud
- # this assumes that the user is auth'd via `gcloud auth application-default login`
- if method is None or method != "oauth":
- raise Exception("Oauth is the current method supported for Big Query.")
- conn_info = {
- "driver": conn_type,
- "project": credentials.get("project"),
- "dataset": credentials.get("dataset"),
- }
- self.threads = credentials.get("threads")
- elif conn_type == "duckdb":
- conn_info = {
- "driver": conn_type,
- "filepath": credentials.get("path"),
- }
- elif conn_type == "redshift":
- if credentials.get("password") is None or credentials.get("method") == "iam":
- raise Exception("Only password authentication is currently supported for Redshift.")
- conn_info = {
- "driver": conn_type,
- "host": credentials.get("host"),
- "user": credentials.get("user"),
- "password": credentials.get("password"),
- "port": credentials.get("port"),
- "dbname": credentials.get("dbname"),
- }
- self.threads = credentials.get("threads")
- elif conn_type == "databricks":
- conn_info = {
- "driver": conn_type,
- "catalog": credentials.get("catalog"),
- "server_hostname": credentials.get("host"),
- "http_path": credentials.get("http_path"),
- "schema": credentials.get("schema"),
- "access_token": credentials.get("token"),
- }
- self.threads = credentials.get("threads")
- elif conn_type == "postgres":
- conn_info = {
- "driver": "postgresql",
- "host": credentials.get("host"),
- "user": credentials.get("user"),
- "password": credentials.get("password"),
- "port": credentials.get("port"),
- "dbname": credentials.get("dbname") or credentials.get("database"),
- }
- self.threads = credentials.get("threads")
- else:
- raise NotImplementedError(f"Provider {conn_type} is not yet supported for dbt diffs")
- self.connection = conn_info
+def _diff_output_base(dev_path: str, prod_path: str) -> str:
+ return f"\n[green]{prod_path} <> {dev_path}[/] \n"
diff --git a/data_diff/dbt_parser.py b/data_diff/dbt_parser.py
new file mode 100644
index 00000000..d03e11a9
--- /dev/null
+++ b/data_diff/dbt_parser.py
@@ -0,0 +1,390 @@
+from argparse import Namespace
+from collections import defaultdict
+import json
+import os
+from pathlib import Path
+from typing import List, Dict, Tuple, Set, Optional
+
+from packaging.version import parse as parse_version
+
+from .utils import getLogger, get_from_dict_with_raise
+from .version import __version__
+
+
+logger = getLogger(__name__)
+
+
+def import_dbt_dependencies():
+ try:
+ from dbt_artifacts_parser.parser import parse_run_results, parse_manifest
+ from dbt.config.renderer import ProfileRenderer
+ import yaml
+ except ImportError:
+ raise RuntimeError("Could not import 'dbt' package. You can install it using: pip install 'data-diff[dbt]'.")
+
+ # dbt 1.5+ specific stuff to power selection of models
+ try:
+ # ProfileRenderer.render_data() fails without instantiating global flag MACRO_DEBUGGING in dbt-core 1.5
+ from dbt.flags import set_flags
+
+ set_flags(Namespace(MACRO_DEBUGGING=False))
+ except:
+ pass
+
+ try:
+ from dbt.cli.main import dbtRunner
+ except ImportError:
+ dbtRunner = None
+
+ if dbtRunner is not None:
+ dbt_runner = dbtRunner()
+ else:
+ dbt_runner = None
+
+ return parse_run_results, parse_manifest, ProfileRenderer, yaml, dbt_runner
+
+
+RUN_RESULTS_PATH = "target/run_results.json"
+MANIFEST_PATH = "target/manifest.json"
+PROJECT_FILE = "dbt_project.yml"
+PROFILES_FILE = "profiles.yml"
+LOWER_DBT_V = "1.0.0"
+UPPER_DBT_V = "1.6.0"
+
+
+# https://github.com/dbt-labs/dbt-core/blob/c952d44ec5c2506995fbad75320acbae49125d3d/core/dbt/cli/resolvers.py#L6
+def default_project_dir() -> Path:
+ paths = list(Path.cwd().parents)
+ paths.insert(0, Path.cwd())
+ return next((x for x in paths if (x / PROJECT_FILE).exists()), Path.cwd())
+
+
+# https://github.com/dbt-labs/dbt-core/blob/c952d44ec5c2506995fbad75320acbae49125d3d/core/dbt/cli/resolvers.py#L12
+def default_profiles_dir() -> Path:
+ return Path.cwd() if (Path.cwd() / PROFILES_FILE).exists() else Path.home() / ".dbt"
+
+
+def legacy_profiles_dir() -> Path:
+ return Path.home() / ".dbt"
+
+
+class DbtParser:
+ def __init__(self, profiles_dir_override: str, project_dir_override: str) -> None:
+ (
+ self.parse_run_results,
+ self.parse_manifest,
+ self.ProfileRenderer,
+ self.yaml,
+ self.dbt_runner,
+ ) = import_dbt_dependencies()
+ self.profiles_dir = Path(profiles_dir_override or default_profiles_dir())
+ self.project_dir = Path(project_dir_override or default_project_dir())
+ self.connection = None
+ self.project_dict = self.get_project_dict()
+ self.manifest_obj = self.get_manifest_obj()
+ self.dbt_user_id = self.manifest_obj.metadata.user_id
+ self.dbt_version = self.manifest_obj.metadata.dbt_version
+ self.dbt_project_id = self.manifest_obj.metadata.project_id
+ self.requires_upper = False
+ self.threads = None
+ self.unique_columns = self.get_unique_columns()
+
+ def get_datadiff_variables(self) -> dict:
+ doc_url = "https://docs.datafold.com/development_testing/open_source#configure-your-dbt-project"
+ error_message = f"vars: data_diff: section not found in dbt_project.yml.\n\nTo solve this, please configure your dbt project: \n{doc_url}\n"
+ vars = get_from_dict_with_raise(self.project_dict, "vars", error_message)
+ return get_from_dict_with_raise(vars, "data_diff", error_message)
+
+ def get_models(self, dbt_selection: Optional[str] = None):
+ dbt_version = parse_version(self.dbt_version)
+ if dbt_selection:
+ if (dbt_version.major, dbt_version.minor) >= (1, 5):
+ if self.dbt_runner:
+ return self.get_dbt_selection_models(dbt_selection)
+ # edge case if running data-diff from a separate env than dbt (likely local development)
+ else:
+ raise Exception(
+ "data-diff is using a dbt-core version < 1.5, update the environment's dbt-core version via pip install 'dbt-core>=1.5' in order to use `--select`"
+ )
+ else:
+ raise Exception(
+ f"Use of the `--select` feature requires dbt >= 1.5. Found dbt manifest: v{dbt_version}"
+ )
+ else:
+ return self.get_run_results_models()
+
+ def get_dbt_selection_models(self, dbt_selection: str) -> List[str]:
+ # log level and format settings needed to prevent dbt from printing to stdout
+ # ls command is used to get the list of model unique_ids
+ results = self.dbt_runner.invoke(
+ [
+ "--log-format",
+ "json",
+ "--log-level",
+ "none",
+ "ls",
+ "--select",
+ dbt_selection,
+ "--resource-type",
+ "model",
+ "--output",
+ "json",
+ "--output-keys",
+ "unique_id",
+ "--project-dir",
+ self.project_dir,
+ ]
+ )
+ if results.exception:
+ raise results.exception
+ elif results.success and results.result:
+ model_list = [json.loads(model)["unique_id"] for model in results.result]
+ models = [self.manifest_obj.nodes.get(x) for x in model_list]
+ return models
+ elif not results.result:
+ raise Exception(f"No dbt models found for `--select {dbt_selection}`")
+ else:
+ logger.debug(str(results))
+ raise Exception("Encountered an unexpected error while finding `--select` models")
+
+ def get_run_results_models(self):
+ with open(self.project_dir / RUN_RESULTS_PATH) as run_results:
+ logger.info(f"Parsing file {RUN_RESULTS_PATH}")
+ run_results_dict = json.load(run_results)
+ run_results_obj = self.parse_run_results(run_results=run_results_dict)
+
+ dbt_version = parse_version(run_results_obj.metadata.dbt_version)
+
+ if dbt_version < parse_version("1.3.0"):
+ self.profiles_dir = legacy_profiles_dir()
+
+ if dbt_version < parse_version(LOWER_DBT_V):
+ raise Exception(f"Found dbt: v{dbt_version} Expected the dbt project's version to be >= {LOWER_DBT_V}")
+ elif dbt_version >= parse_version(UPPER_DBT_V):
+ logger.warning(
+ f"{dbt_version} is a recent version of dbt and may not be fully tested with data-diff! \nPlease report any issues to https://github.com/datafold/data-diff/issues"
+ )
+
+ success_models = [x.unique_id for x in run_results_obj.results if x.status.name == "success"]
+ models = [self.manifest_obj.nodes.get(x) for x in success_models]
+ if not models:
+ raise ValueError("Expected > 0 successful models runs from the last dbt command.")
+
+ print(f"Running with data-diff={__version__}\n")
+ return models
+
+ def get_manifest_obj(self):
+ with open(self.project_dir / MANIFEST_PATH) as manifest:
+ logger.info(f"Parsing file {MANIFEST_PATH}")
+ manifest_dict = json.load(manifest)
+ manifest_obj = self.parse_manifest(manifest=manifest_dict)
+ return manifest_obj
+
+ def get_project_dict(self):
+ with open(self.project_dir / PROJECT_FILE) as project:
+ logger.info(f"Parsing file {PROJECT_FILE}")
+ project_dict = self.yaml.safe_load(project)
+ return project_dict
+
+ def get_connection_creds(self) -> Tuple[Dict[str, str], str]:
+ profiles_path = self.profiles_dir / PROFILES_FILE
+ with open(profiles_path) as profiles:
+ logger.info(f"Parsing file {profiles_path}")
+ profiles = self.yaml.safe_load(profiles)
+
+ dbt_profile_var = self.project_dict.get("profile")
+
+ profile = get_from_dict_with_raise(
+ profiles, dbt_profile_var, f"No profile '{dbt_profile_var}' found in '{profiles_path}'."
+ )
+ # values can contain env_vars
+ rendered_profile = self.ProfileRenderer().render_data(profile)
+ profile_target = get_from_dict_with_raise(
+ rendered_profile, "target", f"No target found in profile '{dbt_profile_var}' in '{profiles_path}'."
+ )
+ outputs = get_from_dict_with_raise(
+ rendered_profile, "outputs", f"No outputs found in profile '{dbt_profile_var}' in '{profiles_path}'."
+ )
+ credentials = get_from_dict_with_raise(
+ outputs,
+ profile_target,
+ f"No credentials found for target '{profile_target}' in profile '{dbt_profile_var}' in '{profiles_path}'.",
+ )
+ conn_type = get_from_dict_with_raise(
+ credentials,
+ "type",
+ f"No type found for target '{profile_target}' in profile '{dbt_profile_var}' in '{profiles_path}'.",
+ )
+ conn_type = conn_type.lower()
+
+ return credentials, conn_type
+
+ def set_connection(self):
+ credentials, conn_type = self.get_connection_creds()
+
+ if conn_type == "snowflake":
+ conn_info = {
+ "driver": conn_type,
+ "user": credentials.get("user"),
+ "account": credentials.get("account"),
+ "database": credentials.get("database"),
+ "warehouse": credentials.get("warehouse"),
+ "role": credentials.get("role"),
+ "schema": credentials.get("schema"),
+ "insecure_mode": credentials.get("insecure_mode", False),
+ "client_session_keep_alive": credentials.get("client_session_keep_alive", False),
+ }
+ self.threads = credentials.get("threads")
+ self.requires_upper = True
+
+ if credentials.get("private_key_path") is not None:
+ if credentials.get("password") is not None:
+ raise Exception("Cannot use password and key at the same time")
+ conn_info["key"] = credentials.get("private_key_path")
+ conn_info["private_key_passphrase"] = credentials.get("private_key_passphrase")
+ elif credentials.get("authenticator") is not None:
+ conn_info["authenticator"] = credentials.get("authenticator")
+ conn_info["password"] = credentials.get("password")
+ elif credentials.get("password") is not None:
+ conn_info["password"] = credentials.get("password")
+ else:
+ raise Exception("Snowflake: unsupported auth method")
+ elif conn_type == "bigquery":
+ method = credentials.get("method")
+ # there are many connection types https://docs.getdbt.com/reference/warehouse-setups/bigquery-setup#oauth-via-gcloud
+ # this assumes that the user is auth'd via `gcloud auth application-default login`
+ if method is None or method != "oauth":
+ raise Exception("Oauth is the current method supported for Big Query.")
+ conn_info = {
+ "driver": conn_type,
+ "project": credentials.get("project"),
+ "dataset": credentials.get("dataset"),
+ }
+ self.threads = credentials.get("threads")
+ elif conn_type == "duckdb":
+ conn_info = {
+ "driver": conn_type,
+ "filepath": credentials.get("path"),
+ }
+ elif conn_type == "redshift":
+ if (credentials.get("pass") is None and credentials.get("password") is None) or credentials.get(
+ "method"
+ ) == "iam":
+ raise Exception("Only password authentication is currently supported for Redshift.")
+ conn_info = {
+ "driver": conn_type,
+ "host": credentials.get("host"),
+ "user": credentials.get("user"),
+ "password": credentials.get("password") or credentials.get("pass"),
+ "port": credentials.get("port"),
+ "dbname": credentials.get("dbname"),
+ }
+ self.threads = credentials.get("threads")
+ elif conn_type == "databricks":
+ conn_info = {
+ "driver": conn_type,
+ "catalog": credentials.get("catalog"),
+ "server_hostname": credentials.get("host"),
+ "http_path": credentials.get("http_path"),
+ "schema": credentials.get("schema"),
+ "access_token": credentials.get("token"),
+ }
+ self.threads = credentials.get("threads")
+ elif conn_type == "postgres":
+ conn_info = {
+ "driver": "postgresql",
+ "host": credentials.get("host"),
+ "user": credentials.get("user"),
+ "password": credentials.get("password"),
+ "port": credentials.get("port"),
+ "dbname": credentials.get("dbname") or credentials.get("database"),
+ }
+ self.threads = credentials.get("threads")
+ else:
+ raise NotImplementedError(f"Provider {conn_type} is not yet supported for dbt diffs")
+
+ self.connection = conn_info
+
+ def get_pk_from_model(self, node, unique_columns: dict, pk_tag: str) -> List[str]:
+ try:
+ # Get a set of all the column names
+ column_names = {name for name, params in node.columns.items()}
+ # Check if the tag is present on a table level
+ if pk_tag in node.meta:
+ # Get all the PKs that are also present as a column
+ pks = [pk for pk in pk_tag in node.meta[pk_tag] if pk in column_names]
+ if pks:
+ # If there are any left, return it
+ logger.debug("Found PKs via Table META: " + str(pks))
+ return pks
+
+ from_meta = [name for name, params in node.columns.items() if pk_tag in params.meta] or None
+ if from_meta:
+ logger.debug("Found PKs via META: " + str(from_meta))
+ return from_meta
+
+ from_tags = [name for name, params in node.columns.items() if pk_tag in params.tags] or None
+ if from_tags:
+ logger.debug("Found PKs via Tags: " + str(from_tags))
+ return from_tags
+
+ if node.unique_id in unique_columns:
+ from_uniq = unique_columns.get(node.unique_id)
+ if from_uniq is not None:
+ logger.debug("Found PKs via Uniqueness tests: " + str(from_uniq))
+ return list(from_uniq)
+
+ except (KeyError, IndexError, TypeError) as e:
+ raise e
+
+ logger.debug("Found no PKs")
+ return []
+
+ def get_unique_columns(self) -> Dict[str, Set[str]]:
+ manifest = self.manifest_obj
+ cols_by_uid = defaultdict(set)
+ for node in manifest.nodes.values():
+ try:
+ if not (node.resource_type.value == "test" and hasattr(node, "test_metadata")):
+ continue
+
+ if not node.depends_on or not node.depends_on.nodes:
+ continue
+
+ uid = node.depends_on.nodes[0]
+
+ # sources can have tests and are not in manifest.nodes
+ # skip as source unique columns are not needed
+ if uid.startswith("source."):
+ continue
+
+ model_node = manifest.nodes[uid]
+
+ if node.test_metadata.name == "unique":
+ column_name: str = node.test_metadata.kwargs["column_name"]
+ for col in self._parse_concat_pk_definition(column_name):
+ if model_node is None or col in model_node.columns:
+ # skip anything that is not a column.
+ # for example, string literals used in concat
+ # like "pk1 || '-' || pk2"
+ cols_by_uid[uid].add(col)
+
+ if node.test_metadata.name == "unique_combination_of_columns":
+ for col in node.test_metadata.kwargs["combination_of_columns"]:
+ cols_by_uid[uid].add(col)
+
+ except (KeyError, IndexError, TypeError) as e:
+ logger.warning("Failure while finding unique cols: %s", e)
+
+ return cols_by_uid
+
+ def _parse_concat_pk_definition(self, definition: str) -> List[str]:
+ definition = definition.strip()
+ if definition.lower().startswith("concat(") and definition.endswith(")"):
+ definition = definition[7:-1] # Removes concat( and )
+ columns = definition.split(",")
+ else:
+ columns = definition.split("||")
+
+ stripped_columns = [col.strip('" ()') for col in columns]
+ return stripped_columns
diff --git a/data_diff/diff_tables.py b/data_diff/diff_tables.py
index c628ca06..aaf56c9c 100644
--- a/data_diff/diff_tables.py
+++ b/data_diff/diff_tables.py
@@ -14,11 +14,11 @@
from data_diff.info_tree import InfoTree, SegmentInfo
-from .utils import run_as_daemon, safezip, getLogger, truncate_error, Vector
+from .utils import dbt_diff_string_template, run_as_daemon, safezip, getLogger, truncate_error, Vector
from .thread_utils import ThreadedYielder
from .table_segment import TableSegment, create_mesh_from_points
from .tracking import create_end_event_json, create_start_event_json, send_event_json, is_tracking_enabled
-from sqeleton.abcs import IKey
+from data_diff.sqeleton.abcs import IKey
logger = getLogger(__name__)
@@ -139,21 +139,16 @@ def get_stats_string(self, is_dbt: bool = False):
diff_stats = self._get_stats(is_dbt)
if is_dbt:
- string_output = "\n| Rows Added\t| Rows Removed\n"
- string_output += "------------------------------------------------------------\n"
-
- string_output += f"| {diff_stats.diff_by_sign['-']}\t\t| {diff_stats.diff_by_sign['+']}\n"
- string_output += "------------------------------------------------------------\n\n"
- string_output += f"Updated Rows: {diff_stats.diff_by_sign['!']}\n"
- string_output += f"Unchanged Rows: {diff_stats.unchanged}\n\n"
-
- string_output += f"Values Updated:"
-
- for k, v in diff_stats.extra_column_diffs.items():
- string_output += f"\n{k}: {v}"
+ string_output = dbt_diff_string_template(
+ diff_stats.diff_by_sign["-"],
+ diff_stats.diff_by_sign["+"],
+ diff_stats.diff_by_sign["!"],
+ diff_stats.unchanged,
+ diff_stats.extra_column_diffs,
+ "Values Updated:",
+ )
else:
-
string_output = ""
string_output += f"{diff_stats.table1_count} rows in table A\n"
string_output += f"{diff_stats.table2_count} rows in table B\n"
diff --git a/data_diff/hashdiff_tables.py b/data_diff/hashdiff_tables.py
index 4d1b03b2..45049ae6 100644
--- a/data_diff/hashdiff_tables.py
+++ b/data_diff/hashdiff_tables.py
@@ -7,10 +7,10 @@
from runtype import dataclass
-from sqeleton.abcs import ColType_UUID, NumericType, PrecisionType, StringType, Boolean
+from data_diff.sqeleton.abcs import ColType_UUID, NumericType, PrecisionType, StringType, Boolean, JSON
from .info_tree import InfoTree
-from .utils import safezip
+from .utils import safezip, diffs_are_equiv_jsons
from .thread_utils import ThreadedYielder
from .table_segment import TableSegment
@@ -24,7 +24,7 @@
logger = logging.getLogger("hashdiff_tables")
-def diff_sets(a: set, b: set) -> Iterator:
+def diff_sets(a: list, b: list, json_cols: dict = None) -> Iterator:
sa = set(a)
sb = set(b)
@@ -38,7 +38,17 @@ def diff_sets(a: set, b: set) -> Iterator:
if row not in sa:
d[row[0]].append(("+", row))
+ warned_diff_cols = set()
for _k, v in sorted(d.items(), key=lambda i: i[0]):
+ if json_cols:
+ parsed_match, overriden_diff_cols = diffs_are_equiv_jsons(v, json_cols)
+ if parsed_match:
+ to_warn = overriden_diff_cols - warned_diff_cols
+ for w in to_warn:
+ logger.warning(f"Equivalent JSON objects with different string representations detected "
+ f"in column '{w}'. These cases are NOT reported as differences.")
+ warned_diff_cols.add(w)
+ continue
yield from v
@@ -194,7 +204,9 @@ def _bisect_and_diff_segments(
# This saves time, as bisection speed is limited by ping and query performance.
if max_rows < self.bisection_threshold or max_space_size < self.bisection_factor * 2:
rows1, rows2 = self._threaded_call("get_values", [table1, table2])
- diff = list(diff_sets(rows1, rows2))
+ json_cols = {i: colname for i, colname in enumerate(table1.extra_columns)
+ if isinstance(table1._schema[colname], JSON)}
+ diff = list(diff_sets(rows1, rows2, json_cols))
info_tree.info.set_diff(diff)
info_tree.info.rowcounts = {1: len(rows1), 2: len(rows2)}
diff --git a/data_diff/joindiff_tables.py b/data_diff/joindiff_tables.py
index 161fe122..93d806df 100644
--- a/data_diff/joindiff_tables.py
+++ b/data_diff/joindiff_tables.py
@@ -10,9 +10,9 @@
from runtype import dataclass
-from sqeleton.databases import Database, MySQL, BigQuery, Presto, Oracle, Snowflake, DbPath
-from sqeleton.abcs import NumericType
-from sqeleton.queries import (
+from data_diff.sqeleton.databases import Database, MySQL, BigQuery, Presto, Oracle, Snowflake, DbPath
+from data_diff.sqeleton.abcs import NumericType
+from data_diff.sqeleton.queries import (
table,
sum_,
min_,
@@ -27,8 +27,8 @@
this,
Compiler,
)
-from sqeleton.queries.ast_classes import Concat, Count, Expr, Random, TablePath, Code, ITable
-from sqeleton.queries.extras import NormalizeAsString
+from data_diff.sqeleton.queries.ast_classes import Concat, Count, Expr, Random, TablePath, Code, ITable
+from data_diff.sqeleton.queries.extras import NormalizeAsString
from .info_tree import InfoTree
@@ -201,7 +201,6 @@ def _diff_segments(
if self.materialize_to_table
else None,
):
-
assert len(a_cols) == len(b_cols)
logger.debug("Querying for different rows")
diff = db.query(diff_rows, list)
diff --git a/data_diff/query_utils.py b/data_diff/query_utils.py
index 4eb07445..4b963039 100644
--- a/data_diff/query_utils.py
+++ b/data_diff/query_utils.py
@@ -2,8 +2,8 @@
from contextlib import suppress
-from sqeleton.databases import DbPath, QueryError, Oracle
-from sqeleton.queries import table, commit, Expr
+from data_diff.sqeleton.databases import DbPath, QueryError, Oracle
+from data_diff.sqeleton.queries import table, commit, Expr
def _drop_table_oracle(name: DbPath):
diff --git a/data_diff/sqeleton/__init__.py b/data_diff/sqeleton/__init__.py
new file mode 100644
index 00000000..ee19447d
--- /dev/null
+++ b/data_diff/sqeleton/__init__.py
@@ -0,0 +1,2 @@
+from .databases import connect
+from .queries import table, this, SKIP, code
diff --git a/data_diff/sqeleton/__main__.py b/data_diff/sqeleton/__main__.py
new file mode 100644
index 00000000..7bcb0699
--- /dev/null
+++ b/data_diff/sqeleton/__main__.py
@@ -0,0 +1,17 @@
+import click
+from .repl import repl as repl_main
+
+
+@click.group(no_args_is_help=True)
+def main():
+ pass
+
+
+@main.command(no_args_is_help=True)
+@click.argument("database", required=True)
+def repl(database):
+ return repl_main(database)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/data_diff/sqeleton/abcs/__init__.py b/data_diff/sqeleton/abcs/__init__.py
new file mode 100644
index 00000000..3f5a8bf4
--- /dev/null
+++ b/data_diff/sqeleton/abcs/__init__.py
@@ -0,0 +1,15 @@
+from .database_types import (
+ AbstractDatabase,
+ AbstractDialect,
+ DbKey,
+ DbPath,
+ DbTime,
+ IKey,
+ ColType_UUID,
+ NumericType,
+ PrecisionType,
+ StringType,
+ Boolean,
+ JSON,
+)
+from .compiler import AbstractCompiler, Compilable
diff --git a/data_diff/sqeleton/abcs/compiler.py b/data_diff/sqeleton/abcs/compiler.py
new file mode 100644
index 00000000..72fd7578
--- /dev/null
+++ b/data_diff/sqeleton/abcs/compiler.py
@@ -0,0 +1,15 @@
+from typing import Any, Dict
+from abc import ABC, abstractmethod
+
+
+class AbstractCompiler(ABC):
+ @abstractmethod
+ def compile(self, elem: Any, params: Dict[str, Any] = None) -> str:
+ ...
+
+
+class Compilable(ABC):
+ # TODO generic syntax, so we can write Compilable[T] for expressions returning a value of type T
+ @abstractmethod
+ def compile(self, c: AbstractCompiler) -> str:
+ ...
diff --git a/data_diff/sqeleton/abcs/database_types.py b/data_diff/sqeleton/abcs/database_types.py
new file mode 100644
index 00000000..f82e681b
--- /dev/null
+++ b/data_diff/sqeleton/abcs/database_types.py
@@ -0,0 +1,410 @@
+import decimal
+from abc import ABC, abstractmethod
+from typing import Sequence, Optional, Tuple, Union, Dict, List
+from datetime import datetime
+
+from runtype import dataclass
+
+from ..utils import ArithAlphanumeric, ArithUUID, Self, Unknown
+
+
+DbPath = Tuple[str, ...]
+DbKey = Union[int, str, bytes, ArithUUID, ArithAlphanumeric]
+DbTime = datetime
+
+
+@dataclass
+class ColType:
+ supported = True
+
+
+@dataclass
+class PrecisionType(ColType):
+ precision: int
+ rounds: Union[bool, Unknown] = Unknown
+
+
+class Boolean(ColType):
+ precision = 0
+
+
+class TemporalType(PrecisionType):
+ pass
+
+
+class Timestamp(TemporalType):
+ pass
+
+
+class TimestampTZ(TemporalType):
+ pass
+
+
+class Datetime(TemporalType):
+ pass
+
+
+class Date(TemporalType):
+ pass
+
+
+@dataclass
+class NumericType(ColType):
+ # 'precision' signifies how many fractional digits (after the dot) we want to compare
+ precision: int
+
+
+class FractionalType(NumericType):
+ pass
+
+
+class Float(FractionalType):
+ python_type = float
+
+
+class IKey(ABC):
+ "Interface for ColType, for using a column as a key in table."
+
+ @property
+ @abstractmethod
+ def python_type(self) -> type:
+ "Return the equivalent Python type of the key"
+
+ def make_value(self, value):
+ return self.python_type(value)
+
+
+class Decimal(FractionalType, IKey): # Snowflake may use Decimal as a key
+ @property
+ def python_type(self) -> type:
+ if self.precision == 0:
+ return int
+ return decimal.Decimal
+
+
+@dataclass
+class StringType(ColType):
+ python_type = str
+
+
+class ColType_UUID(ColType, IKey):
+ python_type = ArithUUID
+
+
+class ColType_Alphanum(ColType, IKey):
+ python_type = ArithAlphanumeric
+
+
+class Native_UUID(ColType_UUID):
+ pass
+
+
+class String_UUID(ColType_UUID, StringType):
+ pass
+
+
+class String_Alphanum(ColType_Alphanum, StringType):
+ @staticmethod
+ def test_value(value: str) -> bool:
+ try:
+ ArithAlphanumeric(value)
+ return True
+ except ValueError:
+ return False
+
+ def make_value(self, value):
+ return self.python_type(value)
+
+
+class String_VaryingAlphanum(String_Alphanum):
+ pass
+
+
+@dataclass
+class String_FixedAlphanum(String_Alphanum):
+ length: int
+
+ def make_value(self, value):
+ if len(value) != self.length:
+ raise ValueError(f"Expected alphanumeric value of length {self.length}, but got '{value}'.")
+ return self.python_type(value, max_len=self.length)
+
+
+@dataclass
+class Text(StringType):
+ supported = False
+
+
+# In majority of DBMSes, it is called JSON/JSONB. Only in Snowflake, it is OBJECT.
+@dataclass
+class JSON(ColType):
+ pass
+
+
+@dataclass
+class Array(ColType):
+ item_type: ColType
+
+
+# Unlike JSON, structs are not free-form and have a very specific set of fields and their types.
+# We do not parse & use those fields now, but we can do this later.
+# For example, in BigQuery:
+# - https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type
+# - https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#struct_literals
+@dataclass
+class Struct(ColType):
+ pass
+
+
+@dataclass
+class Integer(NumericType, IKey):
+ precision: int = 0
+ python_type: type = int
+
+ def __post_init__(self):
+ assert self.precision == 0
+
+
+@dataclass
+class UnknownColType(ColType):
+ text: str
+
+ supported = False
+
+
+class AbstractDialect(ABC):
+ """Dialect-dependent query expressions"""
+
+ @property
+ @abstractmethod
+ def name(self) -> str:
+ "Name of the dialect"
+
+ @classmethod
+ @abstractmethod
+ def load_mixins(cls, *abstract_mixins) -> Self:
+ "Load a list of mixins that implement the given abstract mixins"
+
+ @property
+ @abstractmethod
+ def ROUNDS_ON_PREC_LOSS(self) -> bool:
+ "True if db rounds real values when losing precision, False if it truncates."
+
+ @abstractmethod
+ def quote(self, s: str):
+ "Quote SQL name"
+
+ @abstractmethod
+ def concat(self, items: List[str]) -> str:
+ "Provide SQL for concatenating a bunch of columns into a string"
+
+ @abstractmethod
+ def is_distinct_from(self, a: str, b: str) -> str:
+ "Provide SQL for a comparison where NULL = NULL is true"
+
+ @abstractmethod
+ def to_string(self, s: str) -> str:
+ # TODO rewrite using cast_to(x, str)
+ "Provide SQL for casting a column to string"
+
+ @abstractmethod
+ def random(self) -> str:
+ "Provide SQL for generating a random number betweein 0..1"
+
+ @abstractmethod
+ def current_timestamp(self) -> str:
+ "Provide SQL for returning the current timestamp, aka now"
+
+ @abstractmethod
+ def offset_limit(self, offset: Optional[int] = None, limit: Optional[int] = None):
+ "Provide SQL fragment for limit and offset inside a select"
+
+ @abstractmethod
+ def explain_as_text(self, query: str) -> str:
+ "Provide SQL for explaining a query, returned as table(varchar)"
+
+ @abstractmethod
+ def timestamp_value(self, t: datetime) -> str:
+ "Provide SQL for the given timestamp value"
+
+ @abstractmethod
+ def set_timezone_to_utc(self) -> str:
+ "Provide SQL for setting the session timezone to UTC"
+
+ @abstractmethod
+ def parse_type(
+ self,
+ table_path: DbPath,
+ col_name: str,
+ type_repr: str,
+ datetime_precision: int = None,
+ numeric_precision: int = None,
+ numeric_scale: int = None,
+ ) -> ColType:
+ "Parse type info as returned by the database"
+
+ @abstractmethod
+ def to_comparable(self, value: str, coltype: ColType) -> str:
+ """Ensure that the expression is comparable in ``IS DISTINCT FROM``."""
+
+
+from typing import TypeVar, Generic
+
+T_Dialect = TypeVar("T_Dialect", bound=AbstractDialect)
+
+
+class AbstractDatabase(Generic[T_Dialect]):
+ @property
+ @abstractmethod
+ def dialect(self) -> T_Dialect:
+ "The dialect of the database. Used internally by Database, and also available publicly."
+
+ @classmethod
+ @abstractmethod
+ def load_mixins(cls, *abstract_mixins) -> type:
+ "Extend the dialect with a list of mixins that implement the given abstract mixins."
+
+ @property
+ @abstractmethod
+ def CONNECT_URI_HELP(self) -> str:
+ "Example URI to show the user in help and error messages"
+
+ @property
+ @abstractmethod
+ def CONNECT_URI_PARAMS(self) -> List[str]:
+ "List of parameters given in the path of the URI"
+
+ @abstractmethod
+ def _query(self, sql_code: str) -> list:
+ "Send query to database and return result"
+
+ @abstractmethod
+ def query_table_schema(self, path: DbPath) -> Dict[str, tuple]:
+ """Query the table for its schema for table in 'path', and return {column: tuple}
+ where the tuple is (table_name, col_name, type_repr, datetime_precision?, numeric_precision?, numeric_scale?)
+
+ Note: This method exists instead of select_table_schema(), just because not all databases support
+ accessing the schema using a SQL query.
+ """
+
+ @abstractmethod
+ def select_table_unique_columns(self, path: DbPath) -> str:
+ "Provide SQL for selecting the names of unique columns in the table"
+
+ @abstractmethod
+ def query_table_unique_columns(self, path: DbPath) -> List[str]:
+ """Query the table for its unique columns for table in 'path', and return {column}"""
+
+ @abstractmethod
+ def _process_table_schema(
+ self, path: DbPath, raw_schema: Dict[str, tuple], filter_columns: Sequence[str], where: str = None
+ ):
+ """Process the result of query_table_schema().
+
+ Done in a separate step, to minimize the amount of processed columns.
+ Needed because processing each column may:
+ * throw errors and warnings
+ * query the database to sample values
+
+ """
+
+ @abstractmethod
+ def parse_table_name(self, name: str) -> DbPath:
+ "Parse the given table name into a DbPath"
+
+ @abstractmethod
+ def close(self):
+ "Close connection(s) to the database instance. Querying will stop functioning."
+
+ @property
+ @abstractmethod
+ def is_autocommit(self) -> bool:
+ "Return whether the database autocommits changes. When false, COMMIT statements are skipped."
+
+
+class AbstractTable(ABC):
+ @abstractmethod
+ def select(self, *exprs, distinct=False, **named_exprs) -> "AbstractTable":
+ """Choose new columns, based on the old ones. (aka Projection)
+
+ Parameters:
+ exprs: List of expressions to constitute the columns of the new table.
+ If not provided, returns all columns in source table (i.e. ``select *``)
+ distinct: 'select' or 'select distinct'
+ named_exprs: More expressions to constitute the columns of the new table, aliased to keyword name.
+
+ """
+ # XXX distinct=SKIP
+
+ @abstractmethod
+ def where(self, *exprs) -> "AbstractTable":
+ """Filter the rows, based on the given predicates. (aka Selection)"""
+
+ @abstractmethod
+ def order_by(self, *exprs) -> "AbstractTable":
+ """Order the rows lexicographically, according to the given expressions."""
+
+ @abstractmethod
+ def limit(self, limit: int) -> "AbstractTable":
+ """Stop yielding rows after the given limit. i.e. take the first 'n=limit' rows"""
+
+ @abstractmethod
+ def join(self, target) -> "AbstractTable":
+ """Join the current table with the target table, returning a new table containing both side-by-side.
+
+ When joining, it's recommended to use explicit tables names, instead of `this`, in order to avoid potential name collisions.
+
+ Example:
+ ::
+
+ person = table('person')
+ city = table('city')
+
+ name_and_city = (
+ person
+ .join(city)
+ .on(person['city_id'] == city['id'])
+ .select(person['id'], city['name'])
+ )
+ """
+
+ @abstractmethod
+ def group_by(self, *keys):
+ """Behaves like in SQL, except for a small change in syntax:
+
+ A call to `.agg()` must follow every call to `.group_by()`.
+
+ Example:
+ ::
+
+ # SELECT a, sum(b) FROM tmp GROUP BY 1
+ table('tmp').group_by(this.a).agg(this.b.sum())
+
+ # SELECT a, sum(b) FROM a GROUP BY 1 HAVING (b > 10)
+ (table('tmp')
+ .group_by(this.a)
+ .agg(this.b.sum())
+ .having(this.b > 10)
+ )
+
+ """
+
+ @abstractmethod
+ def count(self) -> int:
+ """SELECT count() FROM self"""
+
+ @abstractmethod
+ def union(self, other: "ITable"):
+ """SELECT * FROM self UNION other"""
+
+ @abstractmethod
+ def union_all(self, other: "ITable"):
+ """SELECT * FROM self UNION ALL other"""
+
+ @abstractmethod
+ def minus(self, other: "ITable"):
+ """SELECT * FROM self EXCEPT other"""
+
+ @abstractmethod
+ def intersect(self, other: "ITable"):
+ """SELECT * FROM self INTERSECT other"""
diff --git a/data_diff/sqeleton/abcs/mixins.py b/data_diff/sqeleton/abcs/mixins.py
new file mode 100644
index 00000000..b07a7315
--- /dev/null
+++ b/data_diff/sqeleton/abcs/mixins.py
@@ -0,0 +1,180 @@
+from abc import ABC, abstractmethod
+from .database_types import Array, TemporalType, FractionalType, ColType_UUID, Boolean, ColType, String_UUID, JSON, Struct
+from .compiler import Compilable
+
+
+class AbstractMixin(ABC):
+ "A mixin for a database dialect"
+
+
+class AbstractMixin_NormalizeValue(AbstractMixin):
+
+ @abstractmethod
+ def to_comparable(self, value: str, coltype: ColType) -> str:
+ """Ensure that the expression is comparable in ``IS DISTINCT FROM``."""
+
+ @abstractmethod
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ """Creates an SQL expression, that converts 'value' to a normalized timestamp.
+
+ The returned expression must accept any SQL datetime/timestamp, and return a string.
+
+ Date format: ``YYYY-MM-DD HH:mm:SS.FFFFFF``
+
+ Precision of dates should be rounded up/down according to coltype.rounds
+ """
+
+ @abstractmethod
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ """Creates an SQL expression, that converts 'value' to a normalized number.
+
+ The returned expression must accept any SQL int/numeric/float, and return a string.
+
+ Floats/Decimals are expected in the format
+ "I.P"
+
+ Where I is the integer part of the number (as many digits as necessary),
+ and must be at least one digit (0).
+ P is the fractional digits, the amount of which is specified with
+ coltype.precision. Trailing zeroes may be necessary.
+ If P is 0, the dot is omitted.
+
+ Note: We use 'precision' differently than most databases. For decimals,
+ it's the same as ``numeric_scale``, and for floats, who use binary precision,
+ it can be calculated as ``log10(2**numeric_precision)``.
+ """
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ """Creates an SQL expression, that converts 'value' to either '0' or '1'."""
+ return self.to_string(value)
+
+ def normalize_uuid(self, value: str, coltype: ColType_UUID) -> str:
+ """Creates an SQL expression, that strips uuids of artifacts like whitespace."""
+ if isinstance(coltype, String_UUID):
+ return f"TRIM({value})"
+ return self.to_string(value)
+
+ def normalize_json(self, value: str, _coltype: JSON) -> str:
+ """Creates an SQL expression, that converts 'value' to its minified json string representation."""
+ return self.to_string(value)
+
+ def normalize_array(self, value: str, _coltype: Array) -> str:
+ """Creates an SQL expression, that serialized an array into a JSON string."""
+ return self.to_string(value)
+
+ def normalize_struct(self, value: str, _coltype: Struct) -> str:
+ """Creates an SQL expression, that serialized a typed struct into a JSON string."""
+ return self.to_string(value)
+
+ def normalize_value_by_type(self, value: str, coltype: ColType) -> str:
+ """Creates an SQL expression, that converts 'value' to a normalized representation.
+
+ The returned expression must accept any SQL value, and return a string.
+
+ The default implementation dispatches to a method according to `coltype`:
+
+ ::
+
+ TemporalType -> normalize_timestamp()
+ FractionalType -> normalize_number()
+ *else* -> to_string()
+
+ (`Integer` falls in the *else* category)
+
+ """
+ if isinstance(coltype, TemporalType):
+ return self.normalize_timestamp(value, coltype)
+ elif isinstance(coltype, FractionalType):
+ return self.normalize_number(value, coltype)
+ elif isinstance(coltype, ColType_UUID):
+ return self.normalize_uuid(value, coltype)
+ elif isinstance(coltype, Boolean):
+ return self.normalize_boolean(value, coltype)
+ elif isinstance(coltype, JSON):
+ return self.normalize_json(value, coltype)
+ elif isinstance(coltype, Array):
+ return self.normalize_array(value, coltype)
+ elif isinstance(coltype, Struct):
+ return self.normalize_struct(value, coltype)
+ return self.to_string(value)
+
+
+class AbstractMixin_MD5(AbstractMixin):
+ """Methods for calculating an MD6 hash as an integer."""
+
+ @abstractmethod
+ def md5_as_int(self, s: str) -> str:
+ "Provide SQL for computing md5 and returning an int"
+
+
+class AbstractMixin_Schema(AbstractMixin):
+ """Methods for querying the database schema
+
+ TODO: Move AbstractDatabase.query_table_schema() and friends over here
+ """
+
+ def table_information(self) -> Compilable:
+ "Query to return a table of schema information about existing tables"
+ raise NotImplementedError()
+
+ @abstractmethod
+ def list_tables(self, table_schema: str, like: Compilable = None) -> Compilable:
+ """Query to select the list of tables in the schema. (query return type: table[str])
+
+ If 'like' is specified, the value is applied to the table name, using the 'like' operator.
+ """
+
+
+class AbstractMixin_Regex(AbstractMixin):
+ @abstractmethod
+ def test_regex(self, string: Compilable, pattern: Compilable) -> Compilable:
+ """Tests whether the regex pattern matches the string. Returns a bool expression."""
+
+
+class AbstractMixin_RandomSample(AbstractMixin):
+ @abstractmethod
+ def random_sample_n(self, tbl: str, size: int) -> str:
+ """Take a random sample of the given size, i.e. return 'size' amount of rows"""
+
+ @abstractmethod
+ def random_sample_ratio_approx(self, tbl: str, ratio: float) -> str:
+ """Take a random sample of the approximate size determined by the ratio (0..1), where 0 means no rows, and 1 means all rows
+
+ i.e. the actual mount of rows returned may vary by standard deviation.
+ """
+
+ # def random_sample_ratio(self, table: AbstractTable, ratio: float):
+ # """Take a random sample of the size determined by the ratio (0..1), where 0 means no rows, and 1 means all rows
+ # """
+
+
+class AbstractMixin_TimeTravel(AbstractMixin):
+ @abstractmethod
+ def time_travel(
+ self,
+ table: Compilable,
+ before: bool = False,
+ timestamp: Compilable = None,
+ offset: Compilable = None,
+ statement: Compilable = None,
+ ) -> Compilable:
+ """Selects historical data from a table
+
+ Parameters:
+ table - The name of the table whose history we're querying
+ timestamp - A constant timestamp
+ offset - the time 'offset' seconds before now
+ statement - identifier for statement, e.g. query ID
+
+ Must specify exactly one of `timestamp`, `offset` or `statement`.
+ """
+
+
+class AbstractMixin_OptimizerHints(AbstractMixin):
+ @abstractmethod
+ def optimizer_hints(self, optimizer_hints: str) -> str:
+ """Creates a compatible optimizer_hints string
+
+ Parameters:
+ optimizer_hints - string of optimizer hints
+ """
diff --git a/data_diff/sqeleton/bound_exprs.py b/data_diff/sqeleton/bound_exprs.py
new file mode 100644
index 00000000..188efbca
--- /dev/null
+++ b/data_diff/sqeleton/bound_exprs.py
@@ -0,0 +1,97 @@
+"""Expressions bound to a specific database"""
+
+import inspect
+from functools import wraps
+from typing import Union, TYPE_CHECKING
+
+from runtype import dataclass
+
+from .abcs import AbstractDatabase, AbstractCompiler
+from .queries.ast_classes import ExprNode, ITable, TablePath, Compilable
+from .queries.api import table
+from .schema import create_schema
+
+
+@dataclass
+class BoundNode(ExprNode):
+ database: AbstractDatabase
+ node: Compilable
+
+ def __getattr__(self, attr):
+ value = getattr(self.node, attr)
+ if inspect.ismethod(value):
+
+ @wraps(value)
+ def bound_method(*args, **kw):
+ return BoundNode(self.database, value(*args, **kw))
+
+ return bound_method
+ return value
+
+ def query(self, res_type=list):
+ return self.database.query(self.node, res_type=res_type)
+
+ @property
+ def type(self):
+ return self.node.type
+
+ def compile(self, c: AbstractCompiler) -> str:
+ assert c.database is self.database
+ return self.node.compile(c)
+
+
+def bind_node(node, database):
+ return BoundNode(database, node)
+
+
+ExprNode.bind = bind_node
+
+
+@dataclass
+class BoundTable(BoundNode): # ITable
+ database: AbstractDatabase
+ node: TablePath
+
+ def with_schema(self, schema):
+ table_path = self.node.replace(schema=schema)
+ return self.replace(node=table_path)
+
+ def query_schema(self, *, columns=None, where=None, case_sensitive=True):
+ table_path = self.node
+
+ if table_path.schema:
+ return self
+
+ raw_schema = self.database.query_table_schema(table_path.path)
+ schema = self.database._process_table_schema(table_path.path, raw_schema, columns, where)
+ schema = create_schema(self.database, table_path, schema, case_sensitive)
+ return self.with_schema(schema)
+
+ @property
+ def schema(self):
+ return self.node.schema
+
+
+def bound_table(database: AbstractDatabase, table_path: Union[TablePath, str, tuple], **kw):
+ return BoundTable(database, table(table_path, **kw))
+
+
+# Database.table = bound_table
+
+# def test():
+# from . import connect
+# from .queries.api import table
+# d = connect("mysql://erez:qweqwe123@localhost/erez")
+# t = table(('Rating',))
+
+# b = BoundTable(d, t)
+# b2 = b.with_schema()
+
+# breakpoint()
+
+# test()
+
+if TYPE_CHECKING:
+
+ class BoundTable(BoundTable, TablePath):
+ pass
diff --git a/data_diff/sqeleton/databases/__init__.py b/data_diff/sqeleton/databases/__init__.py
new file mode 100644
index 00000000..44a7e1c8
--- /dev/null
+++ b/data_diff/sqeleton/databases/__init__.py
@@ -0,0 +1,18 @@
+from .base import MD5_HEXDIGITS, CHECKSUM_HEXDIGITS, QueryError, ConnectError, BaseDialect, Database
+from ..abcs import DbPath, DbKey, DbTime
+from ._connect import Connect
+
+from .postgresql import PostgreSQL
+from .mysql import MySQL
+from .oracle import Oracle
+from .snowflake import Snowflake
+from .bigquery import BigQuery
+from .redshift import Redshift
+from .presto import Presto
+from .databricks import Databricks
+from .trino import Trino
+from .clickhouse import Clickhouse
+from .vertica import Vertica
+from .duckdb import DuckDB
+
+connect = Connect()
diff --git a/data_diff/sqeleton/databases/_connect.py b/data_diff/sqeleton/databases/_connect.py
new file mode 100644
index 00000000..f6e2c6f1
--- /dev/null
+++ b/data_diff/sqeleton/databases/_connect.py
@@ -0,0 +1,271 @@
+from typing import Type, Optional, Union, Dict
+from itertools import zip_longest
+from contextlib import suppress
+import dsnparse
+import toml
+
+from runtype import dataclass
+
+from ..abcs.mixins import AbstractMixin
+from ..utils import WeakCache, Self
+from .base import Database, ThreadedDatabase
+from .postgresql import PostgreSQL
+from .mysql import MySQL
+from .oracle import Oracle
+from .snowflake import Snowflake
+from .bigquery import BigQuery
+from .redshift import Redshift
+from .presto import Presto
+from .databricks import Databricks
+from .trino import Trino
+from .clickhouse import Clickhouse
+from .vertica import Vertica
+from .duckdb import DuckDB
+
+
+@dataclass
+class MatchUriPath:
+ database_cls: Type[Database]
+
+ def match_path(self, dsn):
+ help_str = self.database_cls.CONNECT_URI_HELP
+ params = self.database_cls.CONNECT_URI_PARAMS
+ kwparams = self.database_cls.CONNECT_URI_KWPARAMS
+
+ dsn_dict = dict(dsn.query)
+ matches = {}
+ for param, arg in zip_longest(params, dsn.paths):
+ if param is None:
+ raise ValueError(f"Too many parts to path. Expected format: {help_str}")
+
+ optional = param.endswith("?")
+ param = param.rstrip("?")
+
+ if arg is None:
+ try:
+ arg = dsn_dict.pop(param)
+ except KeyError:
+ if not optional:
+ raise ValueError(f"URI must specify '{param}'. Expected format: {help_str}")
+
+ arg = None
+
+ assert param and param not in matches
+ matches[param] = arg
+
+ for param in kwparams:
+ try:
+ arg = dsn_dict.pop(param)
+ except KeyError:
+ raise ValueError(f"URI must specify '{param}'. Expected format: {help_str}")
+
+ assert param and arg and param not in matches, (param, arg, matches.keys())
+ matches[param] = arg
+
+ for param, value in dsn_dict.items():
+ if param in matches:
+ raise ValueError(
+ f"Parameter '{param}' already provided as positional argument. Expected format: {help_str}"
+ )
+
+ matches[param] = value
+
+ return matches
+
+
+DATABASE_BY_SCHEME = {
+ "postgresql": PostgreSQL,
+ "mysql": MySQL,
+ "oracle": Oracle,
+ "redshift": Redshift,
+ "snowflake": Snowflake,
+ "presto": Presto,
+ "bigquery": BigQuery,
+ "databricks": Databricks,
+ "duckdb": DuckDB,
+ "trino": Trino,
+ "clickhouse": Clickhouse,
+ "vertica": Vertica,
+}
+
+
+class Connect:
+ """Provides methods for connecting to a supported database using a URL or connection dict."""
+
+ def __init__(self, database_by_scheme: Dict[str, Database] = DATABASE_BY_SCHEME):
+ self.database_by_scheme = database_by_scheme
+ self.match_uri_path = {name: MatchUriPath(cls) for name, cls in database_by_scheme.items()}
+ self.conn_cache = WeakCache()
+
+ def for_databases(self, *dbs):
+ database_by_scheme = {k: db for k, db in self.database_by_scheme.items() if k in dbs}
+ return type(self)(database_by_scheme)
+
+ def load_mixins(self, *abstract_mixins: AbstractMixin) -> Self:
+ "Extend all the databases with a list of mixins that implement the given abstract mixins."
+ database_by_scheme = {k: db.load_mixins(*abstract_mixins) for k, db in self.database_by_scheme.items()}
+ return type(self)(database_by_scheme)
+
+ def connect_to_uri(self, db_uri: str, thread_count: Optional[int] = 1) -> Database:
+ """Connect to the given database uri
+
+ thread_count determines the max number of worker threads per database,
+ if relevant. None means no limit.
+
+ Parameters:
+ db_uri (str): The URI for the database to connect
+ thread_count (int, optional): Size of the threadpool. Ignored by cloud databases. (default: 1)
+
+ Note: For non-cloud databases, a low thread-pool size may be a performance bottleneck.
+
+ Supported schemes:
+ - postgresql
+ - mysql
+ - oracle
+ - snowflake
+ - bigquery
+ - redshift
+ - presto
+ - databricks
+ - trino
+ - clickhouse
+ - vertica
+ - duckdb
+ """
+
+ dsn = dsnparse.parse(db_uri)
+ if len(dsn.schemes) > 1:
+ raise NotImplementedError("No support for multiple schemes")
+ (scheme,) = dsn.schemes
+
+ if scheme == "toml":
+ toml_path = dsn.path or dsn.host
+ database = dsn.fragment
+ if not database:
+ raise ValueError("Must specify a database name, e.g. 'toml://path#database'. ")
+ with open(toml_path) as f:
+ config = toml.load(f)
+ try:
+ conn_dict = config["database"][database]
+ except KeyError:
+ raise ValueError(f"Cannot find database config named '{database}'.")
+ return self.connect_with_dict(conn_dict, thread_count)
+
+ try:
+ matcher = self.match_uri_path[scheme]
+ except KeyError:
+ raise NotImplementedError(f"Scheme '{scheme}' currently not supported")
+
+ cls = matcher.database_cls
+
+ if scheme == "databricks":
+ assert not dsn.user
+ kw = {}
+ kw["access_token"] = dsn.password
+ kw["http_path"] = dsn.path
+ kw["server_hostname"] = dsn.host
+ kw.update(dsn.query)
+ elif scheme == "duckdb":
+ kw = {}
+ kw["filepath"] = dsn.dbname
+ kw["dbname"] = dsn.user
+ else:
+ kw = matcher.match_path(dsn)
+
+ if scheme == "bigquery":
+ kw["project"] = dsn.host
+ return cls(**kw)
+
+ if scheme == "snowflake":
+ kw["account"] = dsn.host
+ assert not dsn.port
+ kw["user"] = dsn.user
+ kw["password"] = dsn.password
+ else:
+ kw["host"] = dsn.host
+ kw["port"] = dsn.port
+ kw["user"] = dsn.user
+ if dsn.password:
+ kw["password"] = dsn.password
+
+ kw = {k: v for k, v in kw.items() if v is not None}
+
+ if issubclass(cls, ThreadedDatabase):
+ db = cls(thread_count=thread_count, **kw)
+ else:
+ db = cls(**kw)
+
+ return self._connection_created(db)
+
+ def connect_with_dict(self, d, thread_count):
+ d = dict(d)
+ driver = d.pop("driver")
+ try:
+ matcher = self.match_uri_path[driver]
+ except KeyError:
+ raise NotImplementedError(f"Driver '{driver}' currently not supported")
+
+ cls = matcher.database_cls
+ if issubclass(cls, ThreadedDatabase):
+ db = cls(thread_count=thread_count, **d)
+ else:
+ db = cls(**d)
+
+ return self._connection_created(db)
+
+ def _connection_created(self, db):
+ "Nop function to be overridden by subclasses."
+ return db
+
+ def __call__(self, db_conf: Union[str, dict], thread_count: Optional[int] = 1, shared: bool = True) -> Database:
+ """Connect to a database using the given database configuration.
+
+ Configuration can be given either as a URI string, or as a dict of {option: value}.
+
+ The dictionary configuration uses the same keys as the TOML 'database' definition given with --conf.
+
+ thread_count determines the max number of worker threads per database,
+ if relevant. None means no limit.
+
+ Parameters:
+ db_conf (str | dict): The configuration for the database to connect. URI or dict.
+ thread_count (int, optional): Size of the threadpool. Ignored by cloud databases. (default: 1)
+ shared (bool): Whether to cache and return the same connection for the same db_conf. (default: True)
+
+ Note: For non-cloud databases, a low thread-pool size may be a performance bottleneck.
+
+ Supported drivers:
+ - postgresql
+ - mysql
+ - oracle
+ - snowflake
+ - bigquery
+ - redshift
+ - presto
+ - databricks
+ - trino
+ - clickhouse
+ - vertica
+
+ Example:
+ >>> connect("mysql://localhost/db")
+
+ >>> connect({"driver": "mysql", "host": "localhost", "database": "db"})
+
+ """
+ if shared:
+ with suppress(KeyError):
+ conn = self.conn_cache.get(db_conf)
+ if not conn.is_closed:
+ return conn
+
+ if isinstance(db_conf, str):
+ conn = self.connect_to_uri(db_conf, thread_count)
+ elif isinstance(db_conf, dict):
+ conn = self.connect_with_dict(db_conf, thread_count)
+ else:
+ raise TypeError(f"db configuration must be a URI string or a dictionary. Instead got '{db_conf}'.")
+
+ if shared:
+ self.conn_cache.add(db_conf, conn)
+ return conn
diff --git a/data_diff/sqeleton/databases/base.py b/data_diff/sqeleton/databases/base.py
new file mode 100644
index 00000000..8ef01373
--- /dev/null
+++ b/data_diff/sqeleton/databases/base.py
@@ -0,0 +1,602 @@
+from datetime import datetime
+import math
+import sys
+import logging
+from typing import Any, Callable, Dict, Generator, Tuple, Optional, Sequence, Type, List, Union, TypeVar, TYPE_CHECKING
+from functools import partial, wraps
+from concurrent.futures import ThreadPoolExecutor
+import threading
+from abc import abstractmethod
+from uuid import UUID
+import decimal
+
+from runtype import dataclass
+
+from ..utils import is_uuid, safezip, Self
+from ..queries import Expr, Compiler, table, Select, SKIP, Explain, Code, this
+from ..queries.ast_classes import Random
+from ..abcs.database_types import (
+ AbstractDatabase,
+ Array,
+ Struct,
+ AbstractDialect,
+ AbstractTable,
+ ColType,
+ Integer,
+ Decimal,
+ Float,
+ Native_UUID,
+ String_UUID,
+ String_Alphanum,
+ String_VaryingAlphanum,
+ TemporalType,
+ UnknownColType,
+ TimestampTZ,
+ Text,
+ DbTime,
+ DbPath,
+ Boolean,
+ JSON
+)
+from ..abcs.mixins import Compilable
+from ..abcs.mixins import (
+ AbstractMixin_Schema,
+ AbstractMixin_RandomSample,
+ AbstractMixin_NormalizeValue,
+ AbstractMixin_OptimizerHints,
+)
+from ..bound_exprs import bound_table
+
+logger = logging.getLogger("database")
+
+
+def parse_table_name(t):
+ return tuple(t.split("."))
+
+
+def import_helper(package: str = None, text=""):
+ def dec(f):
+ @wraps(f)
+ def _inner():
+ try:
+ return f()
+ except ModuleNotFoundError as e:
+ s = text
+ if package:
+ s += f"You can install it using 'pip install data_diff[{package}]'."
+ raise ModuleNotFoundError(f"{e}\n\n{s}\n")
+
+ return _inner
+
+ return dec
+
+
+class ConnectError(Exception):
+ pass
+
+
+class QueryError(Exception):
+ pass
+
+
+def _one(seq):
+ (x,) = seq
+ return x
+
+
+class ThreadLocalInterpreter:
+ """An interpeter used to execute a sequence of queries within the same thread and cursor.
+
+ Useful for cursor-sensitive operations, such as creating a temporary table.
+ """
+
+ def __init__(self, compiler: Compiler, gen: Generator):
+ self.gen = gen
+ self.compiler = compiler
+
+ def apply_queries(self, callback: Callable[[str], Any]):
+ q: Expr = next(self.gen)
+ while True:
+ sql = self.compiler.compile(q)
+ logger.debug("Running SQL (%s-TL): %s", self.compiler.database.name, sql)
+ try:
+ try:
+ res = callback(sql) if sql is not SKIP else SKIP
+ except Exception as e:
+ q = self.gen.throw(type(e), e)
+ else:
+ q = self.gen.send(res)
+ except StopIteration:
+ break
+
+
+def apply_query(callback: Callable[[str], Any], sql_code: Union[str, ThreadLocalInterpreter]) -> list:
+ if isinstance(sql_code, ThreadLocalInterpreter):
+ return sql_code.apply_queries(callback)
+ else:
+ return callback(sql_code)
+
+
+class Mixin_Schema(AbstractMixin_Schema):
+ def table_information(self) -> Compilable:
+ return table("information_schema", "tables")
+
+ def list_tables(self, table_schema: str, like: Compilable = None) -> Compilable:
+ return (
+ self.table_information()
+ .where(
+ this.table_schema == table_schema,
+ this.table_name.like(like) if like is not None else SKIP,
+ this.table_type == "BASE TABLE",
+ )
+ .select(this.table_name)
+ )
+
+
+class Mixin_RandomSample(AbstractMixin_RandomSample):
+ def random_sample_n(self, tbl: AbstractTable, size: int) -> AbstractTable:
+ # TODO use a more efficient algorithm, when the table count is known
+ return tbl.order_by(Random()).limit(size)
+
+ def random_sample_ratio_approx(self, tbl: AbstractTable, ratio: float) -> AbstractTable:
+ return tbl.where(Random() < ratio)
+
+
+class Mixin_OptimizerHints(AbstractMixin_OptimizerHints):
+ def optimizer_hints(self, hints: str) -> str:
+ return f"/*+ {hints} */ "
+
+
+class BaseDialect(AbstractDialect):
+ SUPPORTS_PRIMARY_KEY = False
+ SUPPORTS_INDEXES = False
+ TYPE_CLASSES: Dict[str, type] = {}
+ MIXINS = frozenset()
+
+ PLACEHOLDER_TABLE = None # Used for Oracle
+
+ def offset_limit(self, offset: Optional[int] = None, limit: Optional[int] = None):
+ if offset:
+ raise NotImplementedError("No support for OFFSET in query")
+
+ return f"LIMIT {limit}"
+
+ def concat(self, items: List[str]) -> str:
+ assert len(items) > 1
+ joined_exprs = ", ".join(items)
+ return f"concat({joined_exprs})"
+
+ def to_comparable(self, value: str, coltype: ColType) -> str:
+ """Ensure that the expression is comparable in ``IS DISTINCT FROM``."""
+ return value
+
+ def is_distinct_from(self, a: str, b: str) -> str:
+ return f"{a} is distinct from {b}"
+
+ def timestamp_value(self, t: DbTime) -> str:
+ return f"'{t.isoformat()}'"
+
+ def random(self) -> str:
+ return "random()"
+
+ def current_timestamp(self) -> str:
+ return "current_timestamp()"
+
+ def explain_as_text(self, query: str) -> str:
+ return f"EXPLAIN {query}"
+
+ def _constant_value(self, v):
+ if v is None:
+ return "NULL"
+ elif isinstance(v, str):
+ return f"'{v}'"
+ elif isinstance(v, datetime):
+ return self.timestamp_value(v)
+ elif isinstance(v, UUID):
+ return f"'{v}'"
+ elif isinstance(v, decimal.Decimal):
+ return str(v)
+ elif isinstance(v, bytearray):
+ return f"'{v.decode()}'"
+ elif isinstance(v, Code):
+ return v.code
+ return repr(v)
+
+ def constant_values(self, rows) -> str:
+ values = ", ".join("(%s)" % ", ".join(self._constant_value(v) for v in row) for row in rows)
+ return f"VALUES {values}"
+
+ def type_repr(self, t) -> str:
+ if isinstance(t, str):
+ return t
+ elif isinstance(t, TimestampTZ):
+ return f"TIMESTAMP({min(t.precision, DEFAULT_DATETIME_PRECISION)})"
+ return {
+ int: "INT",
+ str: "VARCHAR",
+ bool: "BOOLEAN",
+ float: "FLOAT",
+ datetime: "TIMESTAMP",
+ }[t]
+
+ def _parse_type_repr(self, type_repr: str) -> Optional[Type[ColType]]:
+ return self.TYPE_CLASSES.get(type_repr)
+
+ def parse_type(
+ self,
+ table_path: DbPath,
+ col_name: str,
+ type_repr: str,
+ datetime_precision: int = None,
+ numeric_precision: int = None,
+ numeric_scale: int = None,
+ ) -> ColType:
+ """ """
+
+ cls = self._parse_type_repr(type_repr)
+ if cls is None:
+ return UnknownColType(type_repr)
+
+ if issubclass(cls, TemporalType):
+ return cls(
+ precision=datetime_precision if datetime_precision is not None else DEFAULT_DATETIME_PRECISION,
+ rounds=self.ROUNDS_ON_PREC_LOSS,
+ )
+
+ elif issubclass(cls, Integer):
+ return cls()
+
+ elif issubclass(cls, Boolean):
+ return cls()
+
+ elif issubclass(cls, Decimal):
+ if numeric_scale is None:
+ numeric_scale = 0 # Needed for Oracle.
+ return cls(precision=numeric_scale)
+
+ elif issubclass(cls, Float):
+ # assert numeric_scale is None
+ return cls(
+ precision=self._convert_db_precision_to_digits(
+ numeric_precision if numeric_precision is not None else DEFAULT_NUMERIC_PRECISION
+ )
+ )
+
+ elif issubclass(cls, (JSON, Array, Struct, Text, Native_UUID)):
+ return cls()
+
+ raise TypeError(f"Parsing {type_repr} returned an unknown type '{cls}'.")
+
+ def _convert_db_precision_to_digits(self, p: int) -> int:
+ """Convert from binary precision, used by floats, to decimal precision."""
+ # See: https://en.wikipedia.org/wiki/Single-precision_floating-point_format
+ return math.floor(math.log(2**p, 10))
+
+ @classmethod
+ def load_mixins(cls, *abstract_mixins) -> "Self":
+ mixins = {m for m in cls.MIXINS if issubclass(m, abstract_mixins)}
+
+ class _DialectWithMixins(cls, *mixins, *abstract_mixins):
+ pass
+
+ _DialectWithMixins.__name__ = cls.__name__
+ return _DialectWithMixins()
+
+
+T = TypeVar("T", bound=BaseDialect)
+
+
+@dataclass
+class QueryResult:
+ rows: list
+ columns: list = None
+
+ def __iter__(self):
+ return iter(self.rows)
+
+ def __len__(self):
+ return len(self.rows)
+
+ def __getitem__(self, i):
+ return self.rows[i]
+
+
+class Database(AbstractDatabase[T]):
+ """Base abstract class for databases.
+
+ Used for providing connection code and implementation specific SQL utilities.
+
+ Instanciated using :meth:`~data_diff.sqeleton.connect`
+ """
+
+ default_schema: str = None
+ SUPPORTS_ALPHANUMS = True
+ SUPPORTS_UNIQUE_CONSTAINT = False
+
+ CONNECT_URI_KWPARAMS = []
+
+ _interactive = False
+ is_closed = False
+
+ @property
+ def name(self):
+ return type(self).__name__
+
+ def compile(self, sql_ast):
+ compiler = Compiler(self)
+ return compiler.compile(sql_ast)
+
+ def query(self, sql_ast: Union[Expr, Generator], res_type: type = None):
+ """Query the given SQL code/AST, and attempt to convert the result to type 'res_type'
+
+ If given a generator, it will execute all the yielded sql queries with the same thread and cursor.
+ The results of the queries a returned by the `yield` stmt (using the .send() mechanism).
+ It's a cleaner approach than exposing cursors, but may not be enough in all cases.
+ """
+
+ compiler = Compiler(self)
+ if isinstance(sql_ast, Generator):
+ sql_code = ThreadLocalInterpreter(compiler, sql_ast)
+ elif isinstance(sql_ast, list):
+ for i in sql_ast[:-1]:
+ self.query(i)
+ return self.query(sql_ast[-1], res_type)
+ else:
+ if isinstance(sql_ast, str):
+ sql_code = sql_ast
+ else:
+ if res_type is None:
+ res_type = sql_ast.type
+ sql_code = compiler.compile(sql_ast)
+ if sql_code is SKIP:
+ return SKIP
+
+ logger.debug("Running SQL (%s): %s", self.name, sql_code)
+
+ if self._interactive and isinstance(sql_ast, Select):
+ explained_sql = compiler.compile(Explain(sql_ast))
+ explain = self._query(explained_sql)
+ for row in explain:
+ # Most returned a 1-tuple. Presto returns a string
+ if isinstance(row, tuple):
+ (row,) = row
+ logger.debug("EXPLAIN: %s", row)
+ answer = input("Continue? [y/n] ")
+ if answer.lower() not in ["y", "yes"]:
+ sys.exit(1)
+
+ res = self._query(sql_code)
+ if res_type is list:
+ return list(res)
+ elif res_type is int:
+ if not res:
+ raise ValueError("Query returned 0 rows, expected 1")
+ row = _one(res)
+ if not row:
+ raise ValueError("Row is empty, expected 1 column")
+ res = _one(row)
+ if res is None: # May happen due to sum() of 0 items
+ return None
+ return int(res)
+ elif res_type is datetime:
+ res = _one(_one(res))
+ if isinstance(res, str):
+ res = datetime.fromisoformat(res[:23]) # TODO use a better parsing method
+ return res
+ elif res_type is tuple:
+ assert len(res) == 1, (sql_code, res)
+ return res[0]
+ elif getattr(res_type, "__origin__", None) is list and len(res_type.__args__) == 1:
+ if res_type.__args__ in ((int,), (str,)):
+ return [_one(row) for row in res]
+ elif res_type.__args__ in [(Tuple,), (tuple,)]:
+ return [tuple(row) for row in res]
+ elif res_type.__args__ == (dict,):
+ return [dict(safezip(res.columns, row)) for row in res]
+ else:
+ raise ValueError(res_type)
+ return res
+
+ def enable_interactive(self):
+ self._interactive = True
+
+ def select_table_schema(self, path: DbPath) -> str:
+ """Provide SQL for selecting the table schema as (name, type, date_prec, num_prec)"""
+ schema, name = self._normalize_table_path(path)
+
+ return (
+ "SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale "
+ "FROM information_schema.columns "
+ f"WHERE table_name = '{name}' AND table_schema = '{schema}'"
+ )
+
+ def query_table_schema(self, path: DbPath) -> Dict[str, tuple]:
+ rows = self.query(self.select_table_schema(path), list)
+ if not rows:
+ raise RuntimeError(f"{self.name}: Table '{'.'.join(path)}' does not exist, or has no columns")
+
+ d = {r[0]: r for r in rows}
+ assert len(d) == len(rows)
+ return d
+
+ def select_table_unique_columns(self, path: DbPath) -> str:
+ schema, name = self._normalize_table_path(path)
+
+ return (
+ "SELECT column_name "
+ "FROM information_schema.key_column_usage "
+ f"WHERE table_name = '{name}' AND table_schema = '{schema}'"
+ )
+
+ def query_table_unique_columns(self, path: DbPath) -> List[str]:
+ if not self.SUPPORTS_UNIQUE_CONSTAINT:
+ raise NotImplementedError("This database doesn't support 'unique' constraints")
+ res = self.query(self.select_table_unique_columns(path), List[str])
+ return list(res)
+
+ def _process_table_schema(
+ self, path: DbPath, raw_schema: Dict[str, tuple], filter_columns: Sequence[str] = None, where: str = None
+ ):
+ if filter_columns is None:
+ filtered_schema = raw_schema
+ else:
+ accept = {i.lower() for i in filter_columns}
+ filtered_schema = {name: row for name, row in raw_schema.items() if name.lower() in accept}
+
+ col_dict = {row[0]: self.dialect.parse_type(path, *row) for _name, row in filtered_schema.items()}
+
+ self._refine_coltypes(path, col_dict, where)
+
+ # Return a dict of form {name: type} after normalization
+ return col_dict
+
+ def _refine_coltypes(self, table_path: DbPath, col_dict: Dict[str, ColType], where: str = None, sample_size=64):
+ """Refine the types in the column dict, by querying the database for a sample of their values
+
+ 'where' restricts the rows to be sampled.
+ """
+
+ text_columns = [k for k, v in col_dict.items() if isinstance(v, Text)]
+ if not text_columns:
+ return
+
+ if isinstance(self.dialect, AbstractMixin_NormalizeValue):
+ fields = [Code(self.dialect.normalize_uuid(self.dialect.quote(c), String_UUID())) for c in text_columns]
+ else:
+ fields = this[text_columns]
+
+ samples_by_row = self.query(
+ table(*table_path).select(*fields).where(Code(where) if where else SKIP).limit(sample_size), list
+ )
+ if not samples_by_row:
+ raise ValueError(f"Table {table_path} is empty.")
+
+ samples_by_col = list(zip(*samples_by_row))
+
+ for col_name, samples in safezip(text_columns, samples_by_col):
+ uuid_samples = [s for s in samples if s and is_uuid(s)]
+
+ if uuid_samples:
+ if len(uuid_samples) != len(samples):
+ logger.warning(
+ f"Mixed UUID/Non-UUID values detected in column {'.'.join(table_path)}.{col_name}, disabling UUID support."
+ )
+ else:
+ assert col_name in col_dict
+ col_dict[col_name] = String_UUID()
+ continue
+
+ if self.SUPPORTS_ALPHANUMS: # Anything but MySQL (so far)
+ alphanum_samples = [s for s in samples if String_Alphanum.test_value(s)]
+ if alphanum_samples:
+ if len(alphanum_samples) != len(samples):
+ logger.debug(
+ f"Mixed Alphanum/Non-Alphanum values detected in column {'.'.join(table_path)}.{col_name}. It cannot be used as a key."
+ )
+ else:
+ assert col_name in col_dict
+ col_dict[col_name] = String_VaryingAlphanum()
+
+ # @lru_cache()
+ # def get_table_schema(self, path: DbPath) -> Dict[str, ColType]:
+ # return self.query_table_schema(path)
+
+ def _normalize_table_path(self, path: DbPath) -> DbPath:
+ if len(path) == 1:
+ return self.default_schema, path[0]
+ elif len(path) == 2:
+ return path
+
+ raise ValueError(f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected form: schema.table")
+
+ def parse_table_name(self, name: str) -> DbPath:
+ return parse_table_name(name)
+
+ def _query_cursor(self, c, sql_code: str) -> QueryResult:
+ assert isinstance(sql_code, str), sql_code
+ try:
+ c.execute(sql_code)
+ if sql_code.lower().startswith(("select", "explain", "show")):
+ columns = [col[0] for col in c.description]
+ return QueryResult(c.fetchall(), columns)
+ except Exception as _e:
+ # logger.exception(e)
+ # logger.error(f'Caused by SQL: {sql_code}')
+ raise
+
+ def _query_conn(self, conn, sql_code: Union[str, ThreadLocalInterpreter]) -> QueryResult:
+ c = conn.cursor()
+ callback = partial(self._query_cursor, c)
+ return apply_query(callback, sql_code)
+
+ def close(self):
+ self.is_closed = True
+ return super().close()
+
+ def list_tables(self, tables_like, schema=None):
+ return self.query(self.dialect.list_tables(schema or self.default_schema, tables_like))
+
+ def table(self, *path, **kw):
+ return bound_table(self, path, **kw)
+
+ @classmethod
+ def load_mixins(cls, *abstract_mixins) -> type:
+ class _DatabaseWithMixins(cls):
+ dialect = cls.dialect.load_mixins(*abstract_mixins)
+
+ _DatabaseWithMixins.__name__ = cls.__name__
+ return _DatabaseWithMixins
+
+
+class ThreadedDatabase(Database):
+ """Access the database through singleton threads.
+
+ Used for database connectors that do not support sharing their connection between different threads.
+ """
+
+ def __init__(self, thread_count=1):
+ self._init_error = None
+ self._queue = ThreadPoolExecutor(thread_count, initializer=self.set_conn)
+ self.thread_local = threading.local()
+ logger.info(f"[{self.name}] Starting a threadpool, size={thread_count}.")
+
+ def set_conn(self):
+ assert not hasattr(self.thread_local, "conn")
+ try:
+ self.thread_local.conn = self.create_connection()
+ except Exception as e:
+ self._init_error = e
+
+ def _query(self, sql_code: Union[str, ThreadLocalInterpreter]) -> QueryResult:
+ r = self._queue.submit(self._query_in_worker, sql_code)
+ return r.result()
+
+ def _query_in_worker(self, sql_code: Union[str, ThreadLocalInterpreter]):
+ "This method runs in a worker thread"
+ if self._init_error:
+ raise self._init_error
+ return self._query_conn(self.thread_local.conn, sql_code)
+
+ @abstractmethod
+ def create_connection(self):
+ "Return a connection instance, that supports the .cursor() method."
+
+ def close(self):
+ super().close()
+ self._queue.shutdown()
+
+ @property
+ def is_autocommit(self) -> bool:
+ return False
+
+
+CHECKSUM_HEXDIGITS = 15 # Must be 15 or lower, otherwise SUM() overflows
+MD5_HEXDIGITS = 32
+
+_CHECKSUM_BITSIZE = CHECKSUM_HEXDIGITS << 2
+CHECKSUM_MASK = (2**_CHECKSUM_BITSIZE) - 1
+
+DEFAULT_DATETIME_PRECISION = 6
+DEFAULT_NUMERIC_PRECISION = 24
+
+TIMESTAMP_PRECISION_POS = 20 # len("2022-06-03 12:24:35.") == 20
diff --git a/data_diff/sqeleton/databases/bigquery.py b/data_diff/sqeleton/databases/bigquery.py
new file mode 100644
index 00000000..c2090e5c
--- /dev/null
+++ b/data_diff/sqeleton/databases/bigquery.py
@@ -0,0 +1,276 @@
+import re
+from typing import Any, List, Union
+from ..abcs.database_types import (
+ ColType,
+ Array,
+ JSON,
+ Struct,
+ Timestamp,
+ Datetime,
+ Integer,
+ Decimal,
+ Float,
+ Text,
+ DbPath,
+ FractionalType,
+ TemporalType,
+ Boolean,
+ UnknownColType,
+)
+from ..abcs.mixins import (
+ AbstractMixin_MD5,
+ AbstractMixin_NormalizeValue,
+ AbstractMixin_Schema,
+ AbstractMixin_TimeTravel,
+)
+from ..abcs import Compilable
+from ..queries import this, table, SKIP, code
+from .base import BaseDialect, Database, import_helper, parse_table_name, ConnectError, apply_query, QueryResult
+from .base import TIMESTAMP_PRECISION_POS, ThreadLocalInterpreter, Mixin_RandomSample
+
+
+@import_helper(text="Please install BigQuery and configure your google-cloud access.")
+def import_bigquery():
+ from google.cloud import bigquery
+
+ return bigquery
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"cast(cast( ('0x' || substr(TO_HEX(md5({s})), 18)) as int64) as numeric)"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ timestamp = f"timestamp_micros(cast(round(unix_micros(cast({value} as timestamp))/1000000, {coltype.precision})*1000000 as int))"
+ return f"FORMAT_TIMESTAMP('%F %H:%M:%E6S', {timestamp})"
+
+ if coltype.precision == 0:
+ return f"FORMAT_TIMESTAMP('%F %H:%M:%S.000000', {value})"
+ elif coltype.precision == 6:
+ return f"FORMAT_TIMESTAMP('%F %H:%M:%E6S', {value})"
+
+ timestamp6 = f"FORMAT_TIMESTAMP('%F %H:%M:%E6S', {value})"
+ return (
+ f"RPAD(LEFT({timestamp6}, {TIMESTAMP_PRECISION_POS+coltype.precision}), {TIMESTAMP_PRECISION_POS+6}, '0')"
+ )
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return f"format('%.{coltype.precision}f', {value})"
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ return self.to_string(f"cast({value} as int)")
+
+ def normalize_json(self, value: str, _coltype: JSON) -> str:
+ # BigQuery is unable to compare arrays & structs with ==/!=/distinct from, e.g.:
+ # Got error: 400 Grouping is not defined for arguments of type ARRAY at …
+ # So we do the best effort and compare it as strings, hoping that the JSON forms
+ # match on both sides: i.e. have properly ordered keys, same spacing, same quotes, etc.
+ return f"to_json_string({value})"
+
+ def normalize_array(self, value: str, _coltype: Array) -> str:
+ # BigQuery is unable to compare arrays & structs with ==/!=/distinct from, e.g.:
+ # Got error: 400 Grouping is not defined for arguments of type ARRAY at …
+ # So we do the best effort and compare it as strings, hoping that the JSON forms
+ # match on both sides: i.e. have properly ordered keys, same spacing, same quotes, etc.
+ return f"to_json_string({value})"
+
+ def normalize_struct(self, value: str, _coltype: Struct) -> str:
+ # BigQuery is unable to compare arrays & structs with ==/!=/distinct from, e.g.:
+ # Got error: 400 Grouping is not defined for arguments of type ARRAY at …
+ # So we do the best effort and compare it as strings, hoping that the JSON forms
+ # match on both sides: i.e. have properly ordered keys, same spacing, same quotes, etc.
+ return f"to_json_string({value})"
+
+
+class Mixin_Schema(AbstractMixin_Schema):
+ def list_tables(self, table_schema: str, like: Compilable = None) -> Compilable:
+ return (
+ table(table_schema, "INFORMATION_SCHEMA", "TABLES")
+ .where(
+ this.table_schema == table_schema,
+ this.table_name.like(like) if like is not None else SKIP,
+ this.table_type == "BASE TABLE",
+ )
+ .select(this.table_name)
+ )
+
+
+class Mixin_TimeTravel(AbstractMixin_TimeTravel):
+ def time_travel(
+ self,
+ table: Compilable,
+ before: bool = False,
+ timestamp: Compilable = None,
+ offset: Compilable = None,
+ statement: Compilable = None,
+ ) -> Compilable:
+ if before:
+ raise NotImplementedError("before=True not supported for BigQuery time-travel")
+
+ if statement is not None:
+ raise NotImplementedError("BigQuery time-travel doesn't support querying by statement id")
+
+ if timestamp is not None:
+ assert offset is None
+ return code("{table} FOR SYSTEM_TIME AS OF {timestamp}", table=table, timestamp=timestamp)
+
+ assert offset is not None
+ return code(
+ "{table} FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL {offset} HOUR);",
+ table=table,
+ offset=offset,
+ )
+
+
+class Dialect(BaseDialect, Mixin_Schema):
+ name = "BigQuery"
+ ROUNDS_ON_PREC_LOSS = False # Technically BigQuery doesn't allow implicit rounding or truncation
+ TYPE_CLASSES = {
+ # Dates
+ "TIMESTAMP": Timestamp,
+ "DATETIME": Datetime,
+ # Numbers
+ "INT64": Integer,
+ "INT32": Integer,
+ "NUMERIC": Decimal,
+ "BIGNUMERIC": Decimal,
+ "FLOAT64": Float,
+ "FLOAT32": Float,
+ "STRING": Text,
+ "BOOL": Boolean,
+ "JSON": JSON,
+ }
+ TYPE_ARRAY_RE = re.compile(r'ARRAY<(.+)>')
+ TYPE_STRUCT_RE = re.compile(r'STRUCT<(.+)>')
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_TimeTravel, Mixin_RandomSample}
+
+ def random(self) -> str:
+ return "RAND()"
+
+ def quote(self, s: str):
+ return f"`{s}`"
+
+ def to_string(self, s: str):
+ return f"cast({s} as string)"
+
+ def type_repr(self, t) -> str:
+ try:
+ return {str: "STRING", float: "FLOAT64"}[t]
+ except KeyError:
+ return super().type_repr(t)
+
+ def parse_type(
+ self,
+ table_path: DbPath,
+ col_name: str,
+ type_repr: str,
+ *args: Any, # pass-through args
+ **kwargs: Any, # pass-through args
+ ) -> ColType:
+ col_type = super().parse_type(table_path, col_name, type_repr, *args, **kwargs)
+ if isinstance(col_type, UnknownColType):
+
+ m = self.TYPE_ARRAY_RE.fullmatch(type_repr)
+ if m:
+ item_type = self.parse_type(table_path, col_name, m.group(1), *args, **kwargs)
+ col_type = Array(item_type=item_type)
+
+ # We currently ignore structs' structure, but later can parse it too. Examples:
+ # - STRUCT (unnamed)
+ # - STRUCT (named)
+ # - STRUCT> (with complex fields)
+ # - STRUCT> (nested)
+ m = self.TYPE_STRUCT_RE.fullmatch(type_repr)
+ if m:
+ col_type = Struct()
+
+ return col_type
+
+ def to_comparable(self, value: str, coltype: ColType) -> str:
+ """Ensure that the expression is comparable in ``IS DISTINCT FROM``."""
+ if isinstance(coltype, (JSON, Array, Struct)):
+ return self.normalize_value_by_type(value, coltype)
+ else:
+ return super().to_comparable(value, coltype)
+
+ def set_timezone_to_utc(self) -> str:
+ raise NotImplementedError()
+
+
+class BigQuery(Database):
+ CONNECT_URI_HELP = "bigquery:///"
+ CONNECT_URI_PARAMS = ["dataset"]
+ dialect = Dialect()
+
+ def __init__(self, project, *, dataset, **kw):
+ bigquery = import_bigquery()
+
+ self._client = bigquery.Client(project, **kw)
+ self.project = project
+ self.dataset = dataset
+
+ self.default_schema = dataset
+
+ def _normalize_returned_value(self, value):
+ if isinstance(value, bytes):
+ return value.decode()
+ return value
+
+ def _query_atom(self, sql_code: str):
+ from google.cloud import bigquery
+
+ try:
+ result = self._client.query(sql_code).result()
+ columns = [c.name for c in result.schema]
+ rows = list(result)
+ except Exception as e:
+ msg = "Exception when trying to execute SQL code:\n %s\n\nGot error: %s"
+ raise ConnectError(msg % (sql_code, e))
+
+ if rows and isinstance(rows[0], bigquery.table.Row):
+ rows = [tuple(self._normalize_returned_value(v) for v in row.values()) for row in rows]
+ return QueryResult(rows, columns)
+
+ def _query(self, sql_code: Union[str, ThreadLocalInterpreter]) -> QueryResult:
+ return apply_query(self._query_atom, sql_code)
+
+ def close(self):
+ super().close()
+ self._client.close()
+
+ def select_table_schema(self, path: DbPath) -> str:
+ project, schema, name = self._normalize_table_path(path)
+ return (
+ "SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale "
+ f"FROM `{project}`.`{schema}`.INFORMATION_SCHEMA.COLUMNS "
+ f"WHERE table_name = '{name}' AND table_schema = '{schema}'"
+ )
+
+ def query_table_unique_columns(self, path: DbPath) -> List[str]:
+ return []
+
+ def _normalize_table_path(self, path: DbPath) -> DbPath:
+ if len(path) == 0:
+ raise ValueError(f"{self.name}: Bad table path for {self}: ()")
+ elif len(path) == 1:
+ return (self.project, self.default_schema, path[0])
+ elif len(path) == 2:
+ return (self.project,) + path
+ elif len(path) == 3:
+ return path
+ else:
+ raise ValueError(
+ f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected form: [project.]schema.table"
+ )
+
+ def parse_table_name(self, name: str) -> DbPath:
+ path = parse_table_name(name)
+ return tuple(i for i in self._normalize_table_path(path) if i is not None)
+
+ @property
+ def is_autocommit(self) -> bool:
+ return True
diff --git a/data_diff/sqeleton/databases/clickhouse.py b/data_diff/sqeleton/databases/clickhouse.py
new file mode 100644
index 00000000..6854b070
--- /dev/null
+++ b/data_diff/sqeleton/databases/clickhouse.py
@@ -0,0 +1,196 @@
+from typing import Optional, Type
+
+from .base import (
+ MD5_HEXDIGITS,
+ CHECKSUM_HEXDIGITS,
+ TIMESTAMP_PRECISION_POS,
+ BaseDialect,
+ ThreadedDatabase,
+ import_helper,
+ ConnectError,
+ Mixin_RandomSample,
+)
+from ..abcs.database_types import (
+ ColType,
+ Decimal,
+ Float,
+ Integer,
+ FractionalType,
+ Native_UUID,
+ TemporalType,
+ Text,
+ Timestamp,
+ Boolean,
+)
+from ..abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue
+
+# https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#default-database
+DEFAULT_DATABASE = "default"
+
+
+@import_helper("clickhouse")
+def import_clickhouse():
+ import clickhouse_driver
+
+ return clickhouse_driver
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ substr_idx = 1 + MD5_HEXDIGITS - CHECKSUM_HEXDIGITS
+ return f"reinterpretAsUInt128(reverse(unhex(lowerUTF8(substr(hex(MD5({s})), {substr_idx})))))"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ # If a decimal value has trailing zeros in a fractional part, when casting to string they are dropped.
+ # For example:
+ # select toString(toDecimal128(1.10, 2)); -- the result is 1.1
+ # select toString(toDecimal128(1.00, 2)); -- the result is 1
+ # So, we should use some custom approach to save these trailing zeros.
+ # To avoid it, we can add a small value like 0.000001 to prevent dropping of zeros from the end when casting.
+ # For examples above it looks like:
+ # select toString(toDecimal128(1.10, 2 + 1) + toDecimal128(0.001, 3)); -- the result is 1.101
+ # After that, cut an extra symbol from the string, i.e. 1.101 -> 1.10
+ # So, the algorithm is:
+ # 1. Cast to decimal with precision + 1
+ # 2. Add a small value 10^(-precision-1)
+ # 3. Cast the result to string
+ # 4. Drop the extra digit from the string. To do that, we need to slice the string
+ # with length = digits in an integer part + 1 (symbol of ".") + precision
+
+ if coltype.precision == 0:
+ return self.to_string(f"round({value})")
+
+ precision = coltype.precision
+ # TODO: too complex, is there better performance way?
+ value = f"""
+ if({value} >= 0, '', '-') || left(
+ toString(
+ toDecimal128(
+ round(abs({value}), {precision}),
+ {precision} + 1
+ )
+ +
+ toDecimal128(
+ exp10(-{precision + 1}),
+ {precision} + 1
+ )
+ ),
+ toUInt8(
+ greatest(
+ floor(log10(abs({value}))) + 1,
+ 1
+ )
+ ) + 1 + {precision}
+ )
+ """
+ return value
+
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ prec = coltype.precision
+ if coltype.rounds:
+ timestamp = f"toDateTime64(round(toUnixTimestamp64Micro(toDateTime64({value}, 6)) / 1000000, {prec}), 6)"
+ return self.to_string(timestamp)
+
+ fractional = f"toUnixTimestamp64Micro(toDateTime64({value}, {prec})) % 1000000"
+ fractional = f"lpad({self.to_string(fractional)}, 6, '0')"
+ value = f"formatDateTime({value}, '%Y-%m-%d %H:%M:%S') || '.' || {self.to_string(fractional)}"
+ return f"rpad({value}, {TIMESTAMP_PRECISION_POS + 6}, '0')"
+
+
+class Dialect(BaseDialect):
+ name = "Clickhouse"
+ ROUNDS_ON_PREC_LOSS = False
+ TYPE_CLASSES = {
+ "Int8": Integer,
+ "Int16": Integer,
+ "Int32": Integer,
+ "Int64": Integer,
+ "Int128": Integer,
+ "Int256": Integer,
+ "UInt8": Integer,
+ "UInt16": Integer,
+ "UInt32": Integer,
+ "UInt64": Integer,
+ "UInt128": Integer,
+ "UInt256": Integer,
+ "Float32": Float,
+ "Float64": Float,
+ "Decimal": Decimal,
+ "UUID": Native_UUID,
+ "String": Text,
+ "FixedString": Text,
+ "DateTime": Timestamp,
+ "DateTime64": Timestamp,
+ "Bool": Boolean,
+ }
+ MIXINS = {Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ def quote(self, s: str) -> str:
+ return f'"{s}"'
+
+ def to_string(self, s: str) -> str:
+ return f"toString({s})"
+
+ def _convert_db_precision_to_digits(self, p: int) -> int:
+ # Done the same as for PostgreSQL but need to rewrite in another way
+ # because it does not help for float with a big integer part.
+ return super()._convert_db_precision_to_digits(p) - 2
+
+ def _parse_type_repr(self, type_repr: str) -> Optional[Type[ColType]]:
+ nullable_prefix = "Nullable("
+ if type_repr.startswith(nullable_prefix):
+ type_repr = type_repr[len(nullable_prefix) :].rstrip(")")
+
+ if type_repr.startswith("Decimal"):
+ type_repr = "Decimal"
+ elif type_repr.startswith("FixedString"):
+ type_repr = "FixedString"
+ elif type_repr.startswith("DateTime64"):
+ type_repr = "DateTime64"
+
+ return self.TYPE_CLASSES.get(type_repr)
+
+ # def timestamp_value(self, t: DbTime) -> str:
+ # # return f"'{t}'"
+ # return f"'{str(t)[:19]}'"
+
+ def set_timezone_to_utc(self) -> str:
+ raise NotImplementedError()
+
+ def current_timestamp(self) -> str:
+ return "now()"
+
+
+class Clickhouse(ThreadedDatabase):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "clickhouse://:@/"
+ CONNECT_URI_PARAMS = ["database?"]
+
+ def __init__(self, *, thread_count: int, **kw):
+ super().__init__(thread_count=thread_count)
+
+ self._args = kw
+ # In Clickhouse database and schema are the same
+ self.default_schema = kw.get("database", DEFAULT_DATABASE)
+
+ def create_connection(self):
+ clickhouse = import_clickhouse()
+
+ class SingleConnection(clickhouse.dbapi.connection.Connection):
+ """Not thread-safe connection to Clickhouse"""
+
+ def cursor(self, cursor_factory=None):
+ if not len(self.cursors):
+ _ = super().cursor()
+ return self.cursors[0]
+
+ try:
+ return SingleConnection(**self._args)
+ except clickhouse.OperationError as e:
+ raise ConnectError(*e.args) from e
+
+ @property
+ def is_autocommit(self) -> bool:
+ return True
diff --git a/data_diff/sqeleton/databases/databricks.py b/data_diff/sqeleton/databases/databricks.py
new file mode 100644
index 00000000..fff3d906
--- /dev/null
+++ b/data_diff/sqeleton/databases/databricks.py
@@ -0,0 +1,199 @@
+import math
+from typing import Dict, Sequence
+import logging
+
+from ..abcs.database_types import (
+ Integer,
+ Float,
+ Decimal,
+ Timestamp,
+ Text,
+ TemporalType,
+ NumericType,
+ DbPath,
+ ColType,
+ UnknownColType,
+ Boolean,
+)
+from ..abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue
+from .base import (
+ MD5_HEXDIGITS,
+ CHECKSUM_HEXDIGITS,
+ BaseDialect,
+ ThreadedDatabase,
+ import_helper,
+ parse_table_name,
+ Mixin_RandomSample,
+)
+
+
+@import_helper(text="You can install it using 'pip install databricks-sql-connector'")
+def import_databricks():
+ import databricks.sql
+
+ return databricks
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"cast(conv(substr(md5({s}), {1+MD5_HEXDIGITS-CHECKSUM_HEXDIGITS}), 16, 10) as decimal(38, 0))"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ """Databricks timestamp contains no more than 6 digits in precision"""
+
+ if coltype.rounds:
+ timestamp = f"cast(round(unix_micros({value}) / 1000000, {coltype.precision}) * 1000000 as bigint)"
+ return f"date_format(timestamp_micros({timestamp}), 'yyyy-MM-dd HH:mm:ss.SSSSSS')"
+
+ precision_format = "S" * coltype.precision + "0" * (6 - coltype.precision)
+ return f"date_format({value}, 'yyyy-MM-dd HH:mm:ss.{precision_format}')"
+
+ def normalize_number(self, value: str, coltype: NumericType) -> str:
+ value = f"cast({value} as decimal(38, {coltype.precision}))"
+ if coltype.precision > 0:
+ value = f"format_number({value}, {coltype.precision})"
+ return f"replace({self.to_string(value)}, ',', '')"
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ return self.to_string(f"cast ({value} as int)")
+
+
+class Dialect(BaseDialect):
+ name = "Databricks"
+ ROUNDS_ON_PREC_LOSS = True
+ TYPE_CLASSES = {
+ # Numbers
+ "INT": Integer,
+ "SMALLINT": Integer,
+ "TINYINT": Integer,
+ "BIGINT": Integer,
+ "FLOAT": Float,
+ "DOUBLE": Float,
+ "DECIMAL": Decimal,
+ # Timestamps
+ "TIMESTAMP": Timestamp,
+ # Text
+ "STRING": Text,
+ # Boolean
+ "BOOLEAN": Boolean,
+ }
+ MIXINS = {Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ def quote(self, s: str):
+ return f"`{s}`"
+
+ def to_string(self, s: str) -> str:
+ return f"cast({s} as string)"
+
+ def _convert_db_precision_to_digits(self, p: int) -> int:
+ # Subtracting 2 due to wierd precision issues
+ return max(super()._convert_db_precision_to_digits(p) - 2, 0)
+
+ def set_timezone_to_utc(self) -> str:
+ return "SET TIME ZONE 'UTC'"
+
+
+class Databricks(ThreadedDatabase):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "databricks://:@/"
+ CONNECT_URI_PARAMS = ["catalog", "schema"]
+
+ def __init__(self, *, thread_count, **kw):
+ logging.getLogger("databricks.sql").setLevel(logging.WARNING)
+
+ self._args = kw
+ self.default_schema = kw.get("schema", "default")
+ self.catalog = self._args.get("catalog", "hive_metastore")
+ super().__init__(thread_count=thread_count)
+
+ def create_connection(self):
+ databricks = import_databricks()
+
+ try:
+ return databricks.sql.connect(
+ server_hostname=self._args["server_hostname"],
+ http_path=self._args["http_path"],
+ access_token=self._args["access_token"],
+ catalog=self.catalog,
+ )
+ except databricks.sql.exc.Error as e:
+ raise ConnectionError(*e.args) from e
+
+ def query_table_schema(self, path: DbPath) -> Dict[str, tuple]:
+ # Databricks has INFORMATION_SCHEMA only for Databricks Runtime, not for Databricks SQL.
+ # https://docs.databricks.com/spark/latest/spark-sql/language-manual/information-schema/columns.html
+ # So, to obtain information about schema, we should use another approach.
+
+ conn = self.create_connection()
+
+ catalog, schema, table = self._normalize_table_path(path)
+ with conn.cursor() as cursor:
+ cursor.columns(catalog_name=catalog, schema_name=schema, table_name=table)
+ try:
+ rows = cursor.fetchall()
+ finally:
+ conn.close()
+ if not rows:
+ raise RuntimeError(f"{self.name}: Table '{'.'.join(path)}' does not exist, or has no columns")
+
+ d = {r.COLUMN_NAME: (r.COLUMN_NAME, r.TYPE_NAME, r.DECIMAL_DIGITS, None, None) for r in rows}
+ assert len(d) == len(rows)
+ return d
+
+ def _process_table_schema(
+ self, path: DbPath, raw_schema: Dict[str, tuple], filter_columns: Sequence[str], where: str = None
+ ):
+ accept = {i.lower() for i in filter_columns}
+ rows = [row for name, row in raw_schema.items() if name.lower() in accept]
+
+ resulted_rows = []
+ for row in rows:
+ row_type = "DECIMAL" if row[1].startswith("DECIMAL") else row[1]
+ type_cls = self.dialect.TYPE_CLASSES.get(row_type, UnknownColType)
+
+ if issubclass(type_cls, Integer):
+ row = (row[0], row_type, None, None, 0)
+
+ elif issubclass(type_cls, Float):
+ numeric_precision = math.ceil(row[2] / math.log(2, 10))
+ row = (row[0], row_type, None, numeric_precision, None)
+
+ elif issubclass(type_cls, Decimal):
+ items = row[1][8:].rstrip(")").split(",")
+ numeric_precision, numeric_scale = int(items[0]), int(items[1])
+ row = (row[0], row_type, None, numeric_precision, numeric_scale)
+
+ elif issubclass(type_cls, Timestamp):
+ row = (row[0], row_type, row[2], None, None)
+
+ else:
+ row = (row[0], row_type, None, None, None)
+
+ resulted_rows.append(row)
+
+ col_dict: Dict[str, ColType] = {row[0]: self.dialect.parse_type(path, *row) for row in resulted_rows}
+
+ self._refine_coltypes(path, col_dict, where)
+ return col_dict
+
+ def parse_table_name(self, name: str) -> DbPath:
+ path = parse_table_name(name)
+ return tuple(i for i in self._normalize_table_path(path) if i is not None)
+
+ @property
+ def is_autocommit(self) -> bool:
+ return True
+
+ def _normalize_table_path(self, path: DbPath) -> DbPath:
+ if len(path) == 1:
+ return self.catalog, self.default_schema, path[0]
+ elif len(path) == 2:
+ return self.catalog, path[0], path[1]
+ elif len(path) == 3:
+ return path
+
+ raise ValueError(
+ f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected format: table, schema.table, or catalog.schema.table"
+ )
diff --git a/data_diff/sqeleton/databases/duckdb.py b/data_diff/sqeleton/databases/duckdb.py
new file mode 100644
index 00000000..89be40c4
--- /dev/null
+++ b/data_diff/sqeleton/databases/duckdb.py
@@ -0,0 +1,192 @@
+from typing import Union
+
+from ..utils import match_regexps
+from ..abcs.database_types import (
+ Timestamp,
+ TimestampTZ,
+ DbPath,
+ ColType,
+ Float,
+ Decimal,
+ Integer,
+ TemporalType,
+ Native_UUID,
+ Text,
+ FractionalType,
+ Boolean,
+ AbstractTable,
+)
+from ..abcs.mixins import (
+ AbstractMixin_MD5,
+ AbstractMixin_NormalizeValue,
+ AbstractMixin_RandomSample,
+ AbstractMixin_Regex,
+)
+from .base import (
+ Database,
+ BaseDialect,
+ import_helper,
+ ConnectError,
+ ThreadLocalInterpreter,
+ TIMESTAMP_PRECISION_POS,
+)
+from .base import MD5_HEXDIGITS, CHECKSUM_HEXDIGITS, Mixin_Schema
+from ..queries.ast_classes import Func, Compilable
+from ..queries.api import code
+
+
+@import_helper("duckdb")
+def import_duckdb():
+ import duckdb
+
+ return duckdb
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"('0x' || SUBSTRING(md5({s}), {1+MD5_HEXDIGITS-CHECKSUM_HEXDIGITS},{CHECKSUM_HEXDIGITS}))::BIGINT"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ # It's precision 6 by default. If precision is less than 6 -> we remove the trailing numbers.
+ if coltype.rounds and coltype.precision > 0:
+ return f"CONCAT(SUBSTRING(STRFTIME({value}::TIMESTAMP, '%Y-%m-%d %H:%M:%S.'),1,23), LPAD(((ROUND(strftime({value}::timestamp, '%f')::DECIMAL(15,7)/100000,{coltype.precision-1})*100000)::INT)::VARCHAR,6,'0'))"
+
+ return f"rpad(substring(strftime({value}::timestamp, '%Y-%m-%d %H:%M:%S.%f'),1,{TIMESTAMP_PRECISION_POS+coltype.precision}),26,'0')"
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return self.to_string(f"{value}::DECIMAL(38, {coltype.precision})")
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ return self.to_string(f"{value}::INTEGER")
+
+
+class Mixin_RandomSample(AbstractMixin_RandomSample):
+ def random_sample_n(self, tbl: AbstractTable, size: int) -> AbstractTable:
+ return code("SELECT * FROM ({tbl}) USING SAMPLE {size};", tbl=tbl, size=size)
+
+ def random_sample_ratio_approx(self, tbl: AbstractTable, ratio: float) -> AbstractTable:
+ return code("SELECT * FROM ({tbl}) USING SAMPLE {percent}%;", tbl=tbl, percent=int(100 * ratio))
+
+
+class Mixin_Regex(AbstractMixin_Regex):
+ def test_regex(self, string: Compilable, pattern: Compilable) -> Compilable:
+ return Func("regexp_matches", [string, pattern])
+
+
+class Dialect(BaseDialect, Mixin_Schema):
+ name = "DuckDB"
+ ROUNDS_ON_PREC_LOSS = False
+ SUPPORTS_PRIMARY_KEY = True
+ SUPPORTS_INDEXES = True
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ TYPE_CLASSES = {
+ # Timestamps
+ "TIMESTAMP WITH TIME ZONE": TimestampTZ,
+ "TIMESTAMP": Timestamp,
+ # Numbers
+ "DOUBLE": Float,
+ "FLOAT": Float,
+ "DECIMAL": Decimal,
+ "INTEGER": Integer,
+ "BIGINT": Integer,
+ # Text
+ "VARCHAR": Text,
+ "TEXT": Text,
+ # UUID
+ "UUID": Native_UUID,
+ # Bool
+ "BOOLEAN": Boolean,
+ }
+
+ def quote(self, s: str):
+ return f'"{s}"'
+
+ def to_string(self, s: str):
+ return f"{s}::VARCHAR"
+
+ def _convert_db_precision_to_digits(self, p: int) -> int:
+ # Subtracting 2 due to wierd precision issues in PostgreSQL
+ return super()._convert_db_precision_to_digits(p) - 2
+
+ def parse_type(
+ self,
+ table_path: DbPath,
+ col_name: str,
+ type_repr: str,
+ datetime_precision: int = None,
+ numeric_precision: int = None,
+ numeric_scale: int = None,
+ ) -> ColType:
+ regexps = {
+ r"DECIMAL\((\d+),(\d+)\)": Decimal,
+ }
+
+ for m, t_cls in match_regexps(regexps, type_repr):
+ precision = int(m.group(2))
+ return t_cls(precision=precision)
+
+ return super().parse_type(table_path, col_name, type_repr, datetime_precision, numeric_precision, numeric_scale)
+
+ def set_timezone_to_utc(self) -> str:
+ return "SET GLOBAL TimeZone='UTC'"
+
+ def current_timestamp(self) -> str:
+ return "current_timestamp"
+
+
+class DuckDB(Database):
+ dialect = Dialect()
+ SUPPORTS_UNIQUE_CONSTAINT = False # Temporary, until we implement it
+ default_schema = "main"
+ CONNECT_URI_HELP = "duckdb://@"
+ CONNECT_URI_PARAMS = ["database", "dbpath"]
+
+ def __init__(self, **kw):
+ self._args = kw
+ self._conn = self.create_connection()
+
+ @property
+ def is_autocommit(self) -> bool:
+ return True
+
+ def _query(self, sql_code: Union[str, ThreadLocalInterpreter]):
+ "Uses the standard SQL cursor interface"
+ return self._query_conn(self._conn, sql_code)
+
+ def close(self):
+ super().close()
+ self._conn.close()
+
+ def create_connection(self):
+ ddb = import_duckdb()
+ try:
+ return ddb.connect(self._args["filepath"])
+ except ddb.OperationalError as e:
+ raise ConnectError(*e.args) from e
+
+ def select_table_schema(self, path: DbPath) -> str:
+ database, schema, table = self._normalize_table_path(path)
+
+ info_schema_path = ["information_schema", "columns"]
+ if database:
+ info_schema_path.insert(0, database)
+
+ return (
+ f"SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale FROM {'.'.join(info_schema_path)} "
+ f"WHERE table_name = '{table}' AND table_schema = '{schema}'"
+ )
+
+ def _normalize_table_path(self, path: DbPath) -> DbPath:
+ if len(path) == 1:
+ return None, self.default_schema, path[0]
+ elif len(path) == 2:
+ return None, path[0], path[1]
+ elif len(path) == 3:
+ return path
+
+ raise ValueError(
+ f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected format: table, schema.table, or database.schema.table"
+ )
diff --git a/data_diff/sqeleton/databases/mssql.py b/data_diff/sqeleton/databases/mssql.py
new file mode 100644
index 00000000..8d394e3c
--- /dev/null
+++ b/data_diff/sqeleton/databases/mssql.py
@@ -0,0 +1,25 @@
+# class MsSQL(ThreadedDatabase):
+# "AKA sql-server"
+
+# def __init__(self, host, port, user, password, *, database, thread_count, **kw):
+# args = dict(server=host, port=port, database=database, user=user, password=password, **kw)
+# self._args = {k: v for k, v in args.items() if v is not None}
+
+# super().__init__(thread_count=thread_count)
+
+# def create_connection(self):
+# mssql = import_mssql()
+# try:
+# return mssql.connect(**self._args)
+# except mssql.Error as e:
+# raise ConnectError(*e.args) from e
+
+# def quote(self, s: str):
+# return f"[{s}]"
+
+# def md5_as_int(self, s: str) -> str:
+# return f"CONVERT(decimal(38,0), CONVERT(bigint, HashBytes('MD5', {s}), 2))"
+# # return f"CONVERT(bigint, (CHECKSUM({s})))"
+
+# def to_string(self, s: str):
+# return f"CONVERT(varchar, {s})"
diff --git a/data_diff/sqeleton/databases/mysql.py b/data_diff/sqeleton/databases/mysql.py
new file mode 100644
index 00000000..fd4bc295
--- /dev/null
+++ b/data_diff/sqeleton/databases/mysql.py
@@ -0,0 +1,146 @@
+from ..abcs.database_types import (
+ Datetime,
+ Timestamp,
+ Float,
+ Decimal,
+ Integer,
+ Text,
+ TemporalType,
+ FractionalType,
+ ColType_UUID,
+ Boolean,
+ Date,
+)
+from ..abcs.mixins import (
+ AbstractMixin_MD5,
+ AbstractMixin_NormalizeValue,
+ AbstractMixin_Regex,
+ AbstractMixin_RandomSample,
+)
+from .base import Mixin_OptimizerHints, ThreadedDatabase, import_helper, ConnectError, BaseDialect, Compilable
+from .base import MD5_HEXDIGITS, CHECKSUM_HEXDIGITS, TIMESTAMP_PRECISION_POS, Mixin_Schema, Mixin_RandomSample
+from ..queries.ast_classes import BinBoolOp
+
+
+@import_helper("mysql")
+def import_mysql():
+ import mysql.connector
+
+ return mysql.connector
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"cast(conv(substring(md5({s}), {1+MD5_HEXDIGITS-CHECKSUM_HEXDIGITS}), 16, 10) as unsigned)"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ return self.to_string(f"cast( cast({value} as datetime({coltype.precision})) as datetime(6))")
+
+ s = self.to_string(f"cast({value} as datetime(6))")
+ return f"RPAD(RPAD({s}, {TIMESTAMP_PRECISION_POS+coltype.precision}, '.'), {TIMESTAMP_PRECISION_POS+6}, '0')"
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return self.to_string(f"cast({value} as decimal(38, {coltype.precision}))")
+
+ def normalize_uuid(self, value: str, coltype: ColType_UUID) -> str:
+ return f"TRIM(CAST({value} AS char))"
+
+
+class Mixin_Regex(AbstractMixin_Regex):
+ def test_regex(self, string: Compilable, pattern: Compilable) -> Compilable:
+ return BinBoolOp("REGEXP", [string, pattern])
+
+
+class Dialect(BaseDialect, Mixin_Schema, Mixin_OptimizerHints):
+ name = "MySQL"
+ ROUNDS_ON_PREC_LOSS = True
+ SUPPORTS_PRIMARY_KEY = True
+ SUPPORTS_INDEXES = True
+ TYPE_CLASSES = {
+ # Dates
+ "datetime": Datetime,
+ "timestamp": Timestamp,
+ "date": Date,
+ # Numbers
+ "double": Float,
+ "float": Float,
+ "decimal": Decimal,
+ "int": Integer,
+ "bigint": Integer,
+ "smallint": Integer,
+ "tinyint": Integer,
+ # Text
+ "varchar": Text,
+ "char": Text,
+ "varbinary": Text,
+ "binary": Text,
+ "text": Text,
+ "mediumtext": Text,
+ "longtext": Text,
+ "tinytext": Text,
+ # Boolean
+ "boolean": Boolean,
+ }
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ def quote(self, s: str):
+ return f"`{s}`"
+
+ def to_string(self, s: str):
+ return f"cast({s} as char)"
+
+ def is_distinct_from(self, a: str, b: str) -> str:
+ return f"not ({a} <=> {b})"
+
+ def random(self) -> str:
+ return "RAND()"
+
+ def type_repr(self, t) -> str:
+ try:
+ return {
+ str: "VARCHAR(1024)",
+ }[t]
+ except KeyError:
+ return super().type_repr(t)
+
+ def explain_as_text(self, query: str) -> str:
+ return f"EXPLAIN FORMAT=TREE {query}"
+
+ def optimizer_hints(self, s: str):
+ return f"/*+ {s} */ "
+
+ def set_timezone_to_utc(self) -> str:
+ return "SET @@session.time_zone='+00:00'"
+
+
+class MySQL(ThreadedDatabase):
+ dialect = Dialect()
+ SUPPORTS_ALPHANUMS = False
+ SUPPORTS_UNIQUE_CONSTAINT = True
+ CONNECT_URI_HELP = "mysql://:@/"
+ CONNECT_URI_PARAMS = ["database?"]
+
+ def __init__(self, *, thread_count, **kw):
+ self._args = kw
+
+ super().__init__(thread_count=thread_count)
+
+ # In MySQL schema and database are synonymous
+ try:
+ self.default_schema = kw["database"]
+ except KeyError:
+ raise ValueError("MySQL URL must specify a database")
+
+ def create_connection(self):
+ mysql = import_mysql()
+ try:
+ return mysql.connect(charset="utf8", use_unicode=True, **self._args)
+ except mysql.Error as e:
+ if e.errno == mysql.errorcode.ER_ACCESS_DENIED_ERROR:
+ raise ConnectError("Bad user name or password") from e
+ elif e.errno == mysql.errorcode.ER_BAD_DB_ERROR:
+ raise ConnectError("Database does not exist") from e
+ raise ConnectError(*e.args) from e
diff --git a/data_diff/sqeleton/databases/oracle.py b/data_diff/sqeleton/databases/oracle.py
new file mode 100644
index 00000000..8c749fe3
--- /dev/null
+++ b/data_diff/sqeleton/databases/oracle.py
@@ -0,0 +1,204 @@
+from typing import Dict, List, Optional
+
+from ..utils import match_regexps
+from ..abcs.database_types import (
+ Decimal,
+ Float,
+ Text,
+ DbPath,
+ TemporalType,
+ ColType,
+ DbTime,
+ ColType_UUID,
+ Timestamp,
+ TimestampTZ,
+ FractionalType,
+)
+from ..abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue, AbstractMixin_Schema
+from ..abcs import Compilable
+from ..queries import this, table, SKIP
+from .base import (
+ BaseDialect,
+ Mixin_OptimizerHints,
+ ThreadedDatabase,
+ import_helper,
+ ConnectError,
+ QueryError,
+ Mixin_RandomSample,
+)
+from .base import TIMESTAMP_PRECISION_POS
+
+SESSION_TIME_ZONE = None # Changed by the tests
+
+
+@import_helper("oracle")
+def import_oracle():
+ import cx_Oracle
+
+ return cx_Oracle
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ # standard_hash is faster than DBMS_CRYPTO.Hash
+ # TODO: Find a way to use UTL_RAW.CAST_TO_BINARY_INTEGER ?
+ return f"to_number(substr(standard_hash({s}, 'MD5'), 18), 'xxxxxxxxxxxxxxx')"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_uuid(self, value: str, coltype: ColType_UUID) -> str:
+ # Cast is necessary for correct MD5 (trimming not enough)
+ return f"CAST(TRIM({value}) AS VARCHAR(36))"
+
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ return f"to_char(cast({value} as timestamp({coltype.precision})), 'YYYY-MM-DD HH24:MI:SS.FF6')"
+
+ if coltype.precision > 0:
+ truncated = f"to_char({value}, 'YYYY-MM-DD HH24:MI:SS.FF{coltype.precision}')"
+ else:
+ truncated = f"to_char({value}, 'YYYY-MM-DD HH24:MI:SS.')"
+ return f"RPAD({truncated}, {TIMESTAMP_PRECISION_POS+6}, '0')"
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ # FM999.9990
+ format_str = "FM" + "9" * (38 - coltype.precision)
+ if coltype.precision:
+ format_str += "0." + "9" * (coltype.precision - 1) + "0"
+ return f"to_char({value}, '{format_str}')"
+
+
+class Mixin_Schema(AbstractMixin_Schema):
+ def list_tables(self, table_schema: str, like: Compilable = None) -> Compilable:
+ return (
+ table("ALL_TABLES")
+ .where(
+ this.OWNER == table_schema,
+ this.TABLE_NAME.like(like) if like is not None else SKIP,
+ )
+ .select(table_name=this.TABLE_NAME)
+ )
+
+
+class Dialect(BaseDialect, Mixin_Schema, Mixin_OptimizerHints):
+ name = "Oracle"
+ SUPPORTS_PRIMARY_KEY = True
+ SUPPORTS_INDEXES = True
+ TYPE_CLASSES: Dict[str, type] = {
+ "NUMBER": Decimal,
+ "FLOAT": Float,
+ # Text
+ "CHAR": Text,
+ "NCHAR": Text,
+ "NVARCHAR2": Text,
+ "VARCHAR2": Text,
+ "DATE": Timestamp,
+ }
+ ROUNDS_ON_PREC_LOSS = True
+ PLACEHOLDER_TABLE = "DUAL"
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ def quote(self, s: str):
+ return f'"{s}"'
+
+ def to_string(self, s: str):
+ return f"cast({s} as varchar(1024))"
+
+ def offset_limit(self, offset: Optional[int] = None, limit: Optional[int] = None):
+ if offset:
+ raise NotImplementedError("No support for OFFSET in query")
+
+ return f"FETCH NEXT {limit} ROWS ONLY"
+
+ def concat(self, items: List[str]) -> str:
+ joined_exprs = " || ".join(items)
+ return f"({joined_exprs})"
+
+ def timestamp_value(self, t: DbTime) -> str:
+ return "timestamp '%s'" % t.isoformat(" ")
+
+ def random(self) -> str:
+ return "dbms_random.value"
+
+ def is_distinct_from(self, a: str, b: str) -> str:
+ return f"DECODE({a}, {b}, 1, 0) = 0"
+
+ def type_repr(self, t) -> str:
+ try:
+ return {
+ str: "VARCHAR(1024)",
+ }[t]
+ except KeyError:
+ return super().type_repr(t)
+
+ def constant_values(self, rows) -> str:
+ return " UNION ALL ".join(
+ "SELECT %s FROM DUAL" % ", ".join(self._constant_value(v) for v in row) for row in rows
+ )
+
+ def explain_as_text(self, query: str) -> str:
+ raise NotImplementedError("Explain not yet implemented in Oracle")
+
+ def parse_type(
+ self,
+ table_path: DbPath,
+ col_name: str,
+ type_repr: str,
+ datetime_precision: int = None,
+ numeric_precision: int = None,
+ numeric_scale: int = None,
+ ) -> ColType:
+ regexps = {
+ r"TIMESTAMP\((\d)\) WITH LOCAL TIME ZONE": Timestamp,
+ r"TIMESTAMP\((\d)\) WITH TIME ZONE": TimestampTZ,
+ r"TIMESTAMP\((\d)\)": Timestamp,
+ }
+
+ for m, t_cls in match_regexps(regexps, type_repr):
+ precision = int(m.group(1))
+ return t_cls(precision=precision, rounds=self.ROUNDS_ON_PREC_LOSS)
+
+ return super().parse_type(table_path, col_name, type_repr, datetime_precision, numeric_precision, numeric_scale)
+
+ def set_timezone_to_utc(self) -> str:
+ return "ALTER SESSION SET TIME_ZONE = 'UTC'"
+
+ def current_timestamp(self) -> str:
+ return "LOCALTIMESTAMP"
+
+
+class Oracle(ThreadedDatabase):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "oracle://:@/"
+ CONNECT_URI_PARAMS = ["database?"]
+
+ def __init__(self, *, host, database, thread_count, **kw):
+ self.kwargs = dict(dsn=f"{host}/{database}" if database else host, **kw)
+
+ self.default_schema = kw.get("user").upper()
+
+ super().__init__(thread_count=thread_count)
+
+ def create_connection(self):
+ self._oracle = import_oracle()
+ try:
+ c = self._oracle.connect(**self.kwargs)
+ if SESSION_TIME_ZONE:
+ c.cursor().execute(f"ALTER SESSION SET TIME_ZONE = '{SESSION_TIME_ZONE}'")
+ return c
+ except Exception as e:
+ raise ConnectError(*e.args) from e
+
+ def _query_cursor(self, c, sql_code: str):
+ try:
+ return super()._query_cursor(c, sql_code)
+ except self._oracle.DatabaseError as e:
+ raise QueryError(e)
+
+ def select_table_schema(self, path: DbPath) -> str:
+ schema, name = self._normalize_table_path(path)
+
+ return (
+ f"SELECT column_name, data_type, 6 as datetime_precision, data_precision as numeric_precision, data_scale as numeric_scale"
+ f" FROM ALL_TAB_COLUMNS WHERE table_name = '{name}' AND owner = '{schema}'"
+ )
diff --git a/data_diff/sqeleton/databases/postgresql.py b/data_diff/sqeleton/databases/postgresql.py
new file mode 100644
index 00000000..4caa2f7f
--- /dev/null
+++ b/data_diff/sqeleton/databases/postgresql.py
@@ -0,0 +1,173 @@
+from ..abcs.database_types import (
+ DbPath,
+ JSON,
+ Timestamp,
+ TimestampTZ,
+ Float,
+ Decimal,
+ Integer,
+ TemporalType,
+ Native_UUID,
+ Text,
+ FractionalType,
+ Boolean,
+ Date,
+)
+from ..abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue
+from .base import BaseDialect, ThreadedDatabase, import_helper, ConnectError, Mixin_Schema
+from .base import MD5_HEXDIGITS, CHECKSUM_HEXDIGITS, _CHECKSUM_BITSIZE, TIMESTAMP_PRECISION_POS, Mixin_RandomSample
+
+SESSION_TIME_ZONE = None # Changed by the tests
+
+
+@import_helper("postgresql")
+def import_postgresql():
+ import psycopg2
+ import psycopg2.extras
+
+ psycopg2.extensions.set_wait_callback(psycopg2.extras.wait_select)
+ return psycopg2
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"('x' || substring(md5({s}), {1+MD5_HEXDIGITS-CHECKSUM_HEXDIGITS}))::bit({_CHECKSUM_BITSIZE})::bigint"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ return f"to_char({value}::timestamp({coltype.precision}), 'YYYY-mm-dd HH24:MI:SS.US')"
+
+ timestamp6 = f"to_char({value}::timestamp(6), 'YYYY-mm-dd HH24:MI:SS.US')"
+ return (
+ f"RPAD(LEFT({timestamp6}, {TIMESTAMP_PRECISION_POS+coltype.precision}), {TIMESTAMP_PRECISION_POS+6}, '0')"
+ )
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return self.to_string(f"{value}::decimal(38, {coltype.precision})")
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ return self.to_string(f"{value}::int")
+
+ def normalize_json(self, value: str, _coltype: JSON) -> str:
+ return f"{value}::text"
+
+
+class PostgresqlDialect(BaseDialect, Mixin_Schema):
+ name = "PostgreSQL"
+ ROUNDS_ON_PREC_LOSS = True
+ SUPPORTS_PRIMARY_KEY = True
+ SUPPORTS_INDEXES = True
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ TYPE_CLASSES = {
+ # Timestamps
+ "timestamp with time zone": TimestampTZ,
+ "timestamp without time zone": Timestamp,
+ "timestamp": Timestamp,
+ "date": Date,
+ # Numbers
+ "double precision": Float,
+ "real": Float,
+ "decimal": Decimal,
+ "smallint": Integer,
+ "integer": Integer,
+ "numeric": Decimal,
+ "bigint": Integer,
+ # Text
+ "character": Text,
+ "character varying": Text,
+ "varchar": Text,
+ "text": Text,
+
+ "json": JSON,
+ "jsonb": JSON,
+ "uuid": Native_UUID,
+ "boolean": Boolean,
+ }
+
+ def quote(self, s: str):
+ return f'"{s}"'
+
+ def to_string(self, s: str):
+ return f"{s}::varchar"
+
+ def _convert_db_precision_to_digits(self, p: int) -> int:
+ # Subtracting 2 due to wierd precision issues in PostgreSQL
+ return super()._convert_db_precision_to_digits(p) - 2
+
+ def set_timezone_to_utc(self) -> str:
+ return "SET TIME ZONE 'UTC'"
+
+ def current_timestamp(self) -> str:
+ return "current_timestamp"
+
+ def type_repr(self, t) -> str:
+ if isinstance(t, TimestampTZ):
+ return f"timestamp ({t.precision}) with time zone"
+ return super().type_repr(t)
+
+
+class PostgreSQL(ThreadedDatabase):
+ dialect = PostgresqlDialect()
+ SUPPORTS_UNIQUE_CONSTAINT = True
+ CONNECT_URI_HELP = "postgresql://:@/"
+ CONNECT_URI_PARAMS = ["database?"]
+
+ default_schema = "public"
+
+ def __init__(self, *, thread_count, **kw):
+ self._args = kw
+
+ super().__init__(thread_count=thread_count)
+
+ def create_connection(self):
+ if not self._args:
+ self._args["host"] = None # psycopg2 requires 1+ arguments
+
+ pg = import_postgresql()
+ try:
+ c = pg.connect(**self._args)
+ if SESSION_TIME_ZONE:
+ c.cursor().execute(f"SET TIME ZONE '{SESSION_TIME_ZONE}'")
+ return c
+ except pg.OperationalError as e:
+ raise ConnectError(*e.args) from e
+
+ def select_table_schema(self, path: DbPath) -> str:
+ database, schema, table = self._normalize_table_path(path)
+
+ info_schema_path = ["information_schema", "columns"]
+ if database:
+ info_schema_path.insert(0, database)
+
+ return (
+ f"SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale FROM {'.'.join(info_schema_path)} "
+ f"WHERE table_name = '{table}' AND table_schema = '{schema}'"
+ )
+
+ def select_table_unique_columns(self, path: DbPath) -> str:
+ database, schema, table = self._normalize_table_path(path)
+
+ info_schema_path = ["information_schema", "key_column_usage"]
+ if database:
+ info_schema_path.insert(0, database)
+
+ return (
+ "SELECT column_name "
+ f"FROM {'.'.join(info_schema_path)} "
+ f"WHERE table_name = '{table}' AND table_schema = '{schema}'"
+ )
+
+ def _normalize_table_path(self, path: DbPath) -> DbPath:
+ if len(path) == 1:
+ return None, self.default_schema, path[0]
+ elif len(path) == 2:
+ return None, path[0], path[1]
+ elif len(path) == 3:
+ return path
+
+ raise ValueError(
+ f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected format: table, schema.table, or database.schema.table"
+ )
diff --git a/data_diff/sqeleton/databases/presto.py b/data_diff/sqeleton/databases/presto.py
new file mode 100644
index 00000000..efb55728
--- /dev/null
+++ b/data_diff/sqeleton/databases/presto.py
@@ -0,0 +1,195 @@
+from functools import partial
+import re
+
+from ..utils import match_regexps
+
+from ..abcs.database_types import (
+ Timestamp,
+ TimestampTZ,
+ Integer,
+ Float,
+ Text,
+ FractionalType,
+ DbPath,
+ DbTime,
+ Decimal,
+ ColType,
+ ColType_UUID,
+ TemporalType,
+ Boolean,
+)
+from ..abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue
+from .base import BaseDialect, Database, import_helper, ThreadLocalInterpreter, Mixin_Schema, Mixin_RandomSample
+from .base import (
+ MD5_HEXDIGITS,
+ CHECKSUM_HEXDIGITS,
+ TIMESTAMP_PRECISION_POS,
+)
+
+
+def query_cursor(c, sql_code):
+ c.execute(sql_code)
+ if sql_code.lower().startswith("select"):
+ return c.fetchall()
+ # Required for the query to actually run 🤯
+ if re.match(r"(insert|create|truncate|drop|explain)", sql_code, re.IGNORECASE):
+ return c.fetchone()
+
+
+@import_helper("presto")
+def import_presto():
+ import prestodb
+
+ return prestodb
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"cast(from_base(substr(to_hex(md5(to_utf8({s}))), {1+MD5_HEXDIGITS-CHECKSUM_HEXDIGITS}), 16) as decimal(38, 0))"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_uuid(self, value: str, coltype: ColType_UUID) -> str:
+ # Trim doesn't work on CHAR type
+ return f"TRIM(CAST({value} AS VARCHAR))"
+
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ # TODO rounds
+ if coltype.rounds:
+ s = f"date_format(cast({value} as timestamp(6)), '%Y-%m-%d %H:%i:%S.%f')"
+ else:
+ s = f"date_format(cast({value} as timestamp(6)), '%Y-%m-%d %H:%i:%S.%f')"
+
+ return f"RPAD(RPAD({s}, {TIMESTAMP_PRECISION_POS+coltype.precision}, '.'), {TIMESTAMP_PRECISION_POS+6}, '0')"
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return self.to_string(f"cast({value} as decimal(38,{coltype.precision}))")
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ return self.to_string(f"cast ({value} as int)")
+
+
+class Dialect(BaseDialect, Mixin_Schema):
+ name = "Presto"
+ ROUNDS_ON_PREC_LOSS = True
+ TYPE_CLASSES = {
+ # Timestamps
+ "timestamp with time zone": TimestampTZ,
+ "timestamp without time zone": Timestamp,
+ "timestamp": Timestamp,
+ # Numbers
+ "integer": Integer,
+ "bigint": Integer,
+ "real": Float,
+ "double": Float,
+ # Text
+ "varchar": Text,
+ # Boolean
+ "boolean": Boolean,
+ }
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ def explain_as_text(self, query: str) -> str:
+ return f"EXPLAIN (FORMAT TEXT) {query}"
+
+ def type_repr(self, t) -> str:
+ if isinstance(t, TimestampTZ):
+ return f"timestamp with time zone"
+
+ try:
+ return {float: "REAL"}[t]
+ except KeyError:
+ return super().type_repr(t)
+
+ def timestamp_value(self, t: DbTime) -> str:
+ return f"timestamp '{t.isoformat(' ')}'"
+
+ def quote(self, s: str):
+ return f'"{s}"'
+
+ def to_string(self, s: str):
+ return f"cast({s} as varchar)"
+
+ def parse_type(
+ self,
+ table_path: DbPath,
+ col_name: str,
+ type_repr: str,
+ datetime_precision: int = None,
+ numeric_precision: int = None,
+ _numeric_scale: int = None,
+ ) -> ColType:
+ timestamp_regexps = {
+ r"timestamp\((\d)\)": Timestamp,
+ r"timestamp\((\d)\) with time zone": TimestampTZ,
+ }
+ for m, t_cls in match_regexps(timestamp_regexps, type_repr):
+ precision = int(m.group(1))
+ return t_cls(precision=precision, rounds=self.ROUNDS_ON_PREC_LOSS)
+
+ number_regexps = {r"decimal\((\d+),(\d+)\)": Decimal}
+ for m, n_cls in match_regexps(number_regexps, type_repr):
+ _prec, scale = map(int, m.groups())
+ return n_cls(scale)
+
+ string_regexps = {r"varchar\((\d+)\)": Text, r"char\((\d+)\)": Text}
+ for m, n_cls in match_regexps(string_regexps, type_repr):
+ return n_cls()
+
+ return super().parse_type(table_path, col_name, type_repr, datetime_precision, numeric_precision)
+
+ def set_timezone_to_utc(self) -> str:
+ return "SET TIME ZONE '+00:00'"
+
+ def current_timestamp(self) -> str:
+ return "current_timestamp"
+
+
+class Presto(Database):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "presto://@//"
+ CONNECT_URI_PARAMS = ["catalog", "schema"]
+
+ default_schema = "public"
+
+ def __init__(self, **kw):
+ prestodb = import_presto()
+
+ if kw.get("schema"):
+ self.default_schema = kw.get("schema")
+
+ if kw.get("auth") == "basic": # if auth=basic, add basic authenticator for Presto
+ kw["auth"] = prestodb.auth.BasicAuthentication(kw.pop("user"), kw.pop("password"))
+
+ if "cert" in kw: # if a certificate was specified in URI, verify session with cert
+ cert = kw.pop("cert")
+ self._conn = prestodb.dbapi.connect(**kw)
+ self._conn._http_session.verify = cert
+ else:
+ self._conn = prestodb.dbapi.connect(**kw)
+
+ def _query(self, sql_code: str) -> list:
+ "Uses the standard SQL cursor interface"
+ c = self._conn.cursor()
+
+ if isinstance(sql_code, ThreadLocalInterpreter):
+ return sql_code.apply_queries(partial(query_cursor, c))
+
+ return query_cursor(c, sql_code)
+
+ def close(self):
+ super().close()
+ self._conn.close()
+
+ def select_table_schema(self, path: DbPath) -> str:
+ schema, table = self._normalize_table_path(path)
+
+ return (
+ "SELECT column_name, data_type, 3 as datetime_precision, 3 as numeric_precision, NULL as numeric_scale "
+ "FROM INFORMATION_SCHEMA.COLUMNS "
+ f"WHERE table_name = '{table}' AND table_schema = '{schema}'"
+ )
+
+ @property
+ def is_autocommit(self) -> bool:
+ return False
diff --git a/data_diff/sqeleton/databases/redshift.py b/data_diff/sqeleton/databases/redshift.py
new file mode 100644
index 00000000..662ad55e
--- /dev/null
+++ b/data_diff/sqeleton/databases/redshift.py
@@ -0,0 +1,176 @@
+from typing import List, Dict
+from ..abcs.database_types import (
+ Float,
+ JSON,
+ TemporalType,
+ FractionalType,
+ DbPath,
+ TimestampTZ,
+)
+from ..abcs.mixins import AbstractMixin_MD5
+from .postgresql import (
+ PostgreSQL,
+ MD5_HEXDIGITS,
+ CHECKSUM_HEXDIGITS,
+ TIMESTAMP_PRECISION_POS,
+ PostgresqlDialect,
+ Mixin_NormalizeValue,
+)
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"strtol(substring(md5({s}), {1+MD5_HEXDIGITS-CHECKSUM_HEXDIGITS}), 16)::decimal(38)"
+
+
+class Mixin_NormalizeValue(Mixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ timestamp = f"{value}::timestamp(6)"
+ # Get seconds since epoch. Redshift doesn't support milli- or micro-seconds.
+ secs = f"timestamp 'epoch' + round(extract(epoch from {timestamp})::decimal(38)"
+ # Get the milliseconds from timestamp.
+ ms = f"extract(ms from {timestamp})"
+ # Get the microseconds from timestamp, without the milliseconds!
+ us = f"extract(us from {timestamp})"
+ # epoch = Total time since epoch in microseconds.
+ epoch = f"{secs}*1000000 + {ms}*1000 + {us}"
+ timestamp6 = (
+ f"to_char({epoch}, -6+{coltype.precision}) * interval '0.000001 seconds', 'YYYY-mm-dd HH24:MI:SS.US')"
+ )
+ else:
+ timestamp6 = f"to_char({value}::timestamp(6), 'YYYY-mm-dd HH24:MI:SS.US')"
+ return (
+ f"RPAD(LEFT({timestamp6}, {TIMESTAMP_PRECISION_POS+coltype.precision}), {TIMESTAMP_PRECISION_POS+6}, '0')"
+ )
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return self.to_string(f"{value}::decimal(38,{coltype.precision})")
+
+ def normalize_json(self, value: str, _coltype: JSON) -> str:
+ return f'nvl2({value}, json_serialize({value}), NULL)'
+
+
+class Dialect(PostgresqlDialect):
+ name = "Redshift"
+ TYPE_CLASSES = {
+ **PostgresqlDialect.TYPE_CLASSES,
+ "double": Float,
+ "real": Float,
+ "super": JSON,
+ }
+ SUPPORTS_INDEXES = False
+
+ def concat(self, items: List[str]) -> str:
+ joined_exprs = " || ".join(items)
+ return f"({joined_exprs})"
+
+ def is_distinct_from(self, a: str, b: str) -> str:
+ return f"({a} IS NULL != {b} IS NULL) OR ({a}!={b})"
+
+ def type_repr(self, t) -> str:
+ if isinstance(t, TimestampTZ):
+ return f"timestamptz"
+ return super().type_repr(t)
+
+
+class Redshift(PostgreSQL):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "redshift://:@/"
+ CONNECT_URI_PARAMS = ["database?"]
+
+ def select_table_schema(self, path: DbPath) -> str:
+ database, schema, table = self._normalize_table_path(path)
+
+ info_schema_path = ["information_schema", "columns"]
+ if database:
+ info_schema_path.insert(0, database)
+
+ return (
+ f"SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale FROM {'.'.join(info_schema_path)} "
+ f"WHERE table_name = '{table.lower()}' AND table_schema = '{schema.lower()}'"
+ )
+
+ def select_external_table_schema(self, path: DbPath) -> str:
+ database, schema, table = self._normalize_table_path(path)
+
+ db_clause = ""
+ if database:
+ db_clause = f" AND redshift_database_name = '{database.lower()}'"
+
+ return (
+ f"""SELECT
+ columnname AS column_name
+ , CASE WHEN external_type = 'string' THEN 'varchar' ELSE external_type END AS data_type
+ , NULL AS datetime_precision
+ , NULL AS numeric_precision
+ , NULL AS numeric_scale
+ FROM svv_external_columns
+ WHERE tablename = '{table.lower()}' AND schemaname = '{schema.lower()}'
+ """
+ + db_clause
+ )
+
+ def query_external_table_schema(self, path: DbPath) -> Dict[str, tuple]:
+ rows = self.query(self.select_external_table_schema(path), list)
+ if not rows:
+ raise RuntimeError(f"{self.name}: Table '{'.'.join(path)}' does not exist, or has no columns")
+
+ d = {r[0]: r for r in rows}
+ assert len(d) == len(rows)
+ return d
+
+ def select_view_columns(self, path: DbPath) -> str:
+ _, schema, table = self._normalize_table_path(path)
+
+ return (
+ """select * from pg_get_cols('{}.{}')
+ cols(view_schema name, view_name name, col_name name, col_type varchar, col_num int)
+ """.format(schema, table)
+ )
+
+ def query_pg_get_cols(self, path: DbPath) -> Dict[str, tuple]:
+ rows = self.query(self.select_view_columns(path), list)
+
+ if not rows:
+ raise RuntimeError(f"{self.name}: View '{'.'.join(path)}' does not exist, or has no columns")
+
+ output = {}
+ for r in rows:
+ col_name = r[2]
+ type_info = r[3].split('(')
+ base_type = type_info[0]
+ precision = None
+ scale = None
+
+ if len(type_info) > 1:
+ if base_type == 'numeric':
+ precision, scale = type_info[1][:-1].split(',')
+ precision = int(precision)
+ scale = int(scale)
+
+ out = [col_name, base_type, None, precision, scale]
+ output[col_name] = tuple(out)
+
+ return output
+
+ def query_table_schema(self, path: DbPath) -> Dict[str, tuple]:
+ try:
+ return super().query_table_schema(path)
+ except RuntimeError:
+ try:
+ return self.query_external_table_schema(path)
+ except RuntimeError:
+ return self.query_pg_get_cols()
+
+ def _normalize_table_path(self, path: DbPath) -> DbPath:
+ if len(path) == 1:
+ return None, self.default_schema, path[0]
+ elif len(path) == 2:
+ return None, path[0], path[1]
+ elif len(path) == 3:
+ return path
+
+ raise ValueError(
+ f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected format: table, schema.table, or database.schema.table"
+ )
diff --git a/data_diff/sqeleton/databases/snowflake.py b/data_diff/sqeleton/databases/snowflake.py
new file mode 100644
index 00000000..2d2c77cd
--- /dev/null
+++ b/data_diff/sqeleton/databases/snowflake.py
@@ -0,0 +1,228 @@
+from typing import Union, List
+import logging
+
+from ..abcs.database_types import (
+ Timestamp,
+ TimestampTZ,
+ Decimal,
+ Float,
+ Text,
+ FractionalType,
+ TemporalType,
+ DbPath,
+ Boolean,
+ Date,
+)
+from ..abcs.mixins import (
+ AbstractMixin_MD5,
+ AbstractMixin_NormalizeValue,
+ AbstractMixin_Schema,
+ AbstractMixin_TimeTravel,
+)
+from ..abcs import Compilable
+from data_diff.sqeleton.queries import table, this, SKIP, code
+from .base import (
+ BaseDialect,
+ ConnectError,
+ Database,
+ import_helper,
+ CHECKSUM_MASK,
+ ThreadLocalInterpreter,
+ Mixin_RandomSample,
+)
+
+
+@import_helper("snowflake")
+def import_snowflake():
+ import snowflake.connector
+ from cryptography.hazmat.primitives import serialization
+ from cryptography.hazmat.backends import default_backend
+
+ return snowflake, serialization, default_backend
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"BITAND(md5_number_lower64({s}), {CHECKSUM_MASK})"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ timestamp = f"to_timestamp(round(date_part(epoch_nanosecond, convert_timezone('UTC', {value})::timestamp(9))/1000000000, {coltype.precision}))"
+ else:
+ timestamp = f"cast(convert_timezone('UTC', {value}) as timestamp({coltype.precision}))"
+
+ return f"to_char({timestamp}, 'YYYY-MM-DD HH24:MI:SS.FF6')"
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return self.to_string(f"cast({value} as decimal(38, {coltype.precision}))")
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ return self.to_string(f"{value}::int")
+
+
+class Mixin_Schema(AbstractMixin_Schema):
+ def table_information(self) -> Compilable:
+ return table("INFORMATION_SCHEMA", "TABLES")
+
+ def list_tables(self, table_schema: str, like: Compilable = None) -> Compilable:
+ return (
+ self.table_information()
+ .where(
+ this.TABLE_SCHEMA == table_schema,
+ this.TABLE_NAME.like(like) if like is not None else SKIP,
+ this.TABLE_TYPE == "BASE TABLE",
+ )
+ .select(table_name=this.TABLE_NAME)
+ )
+
+
+class Mixin_TimeTravel(AbstractMixin_TimeTravel):
+ def time_travel(
+ self,
+ table: Compilable,
+ before: bool = False,
+ timestamp: Compilable = None,
+ offset: Compilable = None,
+ statement: Compilable = None,
+ ) -> Compilable:
+ at_or_before = "AT" if before else "BEFORE"
+ if timestamp is not None:
+ assert offset is None and statement is None
+ key = "timestamp"
+ value = timestamp
+ elif offset is not None:
+ assert statement is None
+ key = "offset"
+ value = offset
+ else:
+ assert statement is not None
+ key = "statement"
+ value = statement
+
+ return code(f"{{table}} {at_or_before}({key} => {{value}})", table=table, value=value)
+
+
+class Dialect(BaseDialect, Mixin_Schema):
+ name = "Snowflake"
+ ROUNDS_ON_PREC_LOSS = False
+ TYPE_CLASSES = {
+ # Timestamps
+ "TIMESTAMP_NTZ": Timestamp,
+ "TIMESTAMP_LTZ": Timestamp,
+ "TIMESTAMP_TZ": TimestampTZ,
+ "DATE": Date,
+ # Numbers
+ "NUMBER": Decimal,
+ "FLOAT": Float,
+ # Text
+ "TEXT": Text,
+ # Boolean
+ "BOOLEAN": Boolean,
+ }
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_TimeTravel, Mixin_RandomSample}
+
+ def explain_as_text(self, query: str) -> str:
+ return f"EXPLAIN USING TEXT {query}"
+
+ def quote(self, s: str):
+ return f'"{s}"'
+
+ def to_string(self, s: str):
+ return f"cast({s} as string)"
+
+ def table_information(self) -> Compilable:
+ return table("INFORMATION_SCHEMA", "TABLES")
+
+ def set_timezone_to_utc(self) -> str:
+ return "ALTER SESSION SET TIMEZONE = 'UTC'"
+
+ def optimizer_hints(self, hints: str) -> str:
+ raise NotImplementedError("Optimizer hints not yet implemented in snowflake")
+
+ def type_repr(self, t) -> str:
+ if isinstance(t, TimestampTZ):
+ return f"timestamp_tz({t.precision})"
+ return super().type_repr(t)
+
+
+class Snowflake(Database):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "snowflake://:@//?warehouse="
+ CONNECT_URI_PARAMS = ["database", "schema"]
+ CONNECT_URI_KWPARAMS = ["warehouse"]
+
+ def __init__(self, *, schema: str, **kw):
+ snowflake, serialization, default_backend = import_snowflake()
+ logging.getLogger("snowflake.connector").setLevel(logging.WARNING)
+
+ # Ignore the error: snowflake.connector.network.RetryRequest: could not find io module state
+ # It's a known issue: https://github.com/snowflakedb/snowflake-connector-python/issues/145
+ logging.getLogger("snowflake.connector.network").disabled = True
+
+ assert '"' not in schema, "Schema name should not contain quotes!"
+ # If a private key is used, read it from the specified path and pass it as "private_key" to the connector.
+ if "key" in kw:
+ with open(kw.get("key"), "rb") as key:
+ if "password" in kw:
+ raise ConnectError("Cannot use password and key at the same time")
+ if kw.get("private_key_passphrase"):
+ encoded_passphrase = kw.get("private_key_passphrase").encode()
+ else:
+ encoded_passphrase = None
+ p_key = serialization.load_pem_private_key(
+ key.read(),
+ password=encoded_passphrase,
+ backend=default_backend(),
+ )
+
+ kw["private_key"] = p_key.private_bytes(
+ encoding=serialization.Encoding.DER,
+ format=serialization.PrivateFormat.PKCS8,
+ encryption_algorithm=serialization.NoEncryption(),
+ )
+
+ self._conn = snowflake.connector.connect(schema=f'"{schema}"', **kw)
+
+ self.default_schema = schema
+
+ def close(self):
+ super().close()
+ self._conn.close()
+
+ def _query(self, sql_code: Union[str, ThreadLocalInterpreter]):
+ "Uses the standard SQL cursor interface"
+ return self._query_conn(self._conn, sql_code)
+
+ def select_table_schema(self, path: DbPath) -> str:
+ """Provide SQL for selecting the table schema as (name, type, date_prec, num_prec)"""
+ database, schema, name = self._normalize_table_path(path)
+ info_schema_path = ["information_schema", "columns"]
+ if database:
+ info_schema_path.insert(0, database)
+
+ return (
+ "SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale "
+ f"FROM {'.'.join(info_schema_path)} "
+ f"WHERE table_name = '{name}' AND table_schema = '{schema}'"
+ )
+
+ def _normalize_table_path(self, path: DbPath) -> DbPath:
+ if len(path) == 1:
+ return None, self.default_schema, path[0]
+ elif len(path) == 2:
+ return None, path[0], path[1]
+ elif len(path) == 3:
+ return path
+
+ raise ValueError(
+ f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected format: table, schema.table, or database.schema.table"
+ )
+
+ @property
+ def is_autocommit(self) -> bool:
+ return True
+
+ def query_table_unique_columns(self, path: DbPath) -> List[str]:
+ return []
diff --git a/data_diff/sqeleton/databases/trino.py b/data_diff/sqeleton/databases/trino.py
new file mode 100644
index 00000000..f997447d
--- /dev/null
+++ b/data_diff/sqeleton/databases/trino.py
@@ -0,0 +1,47 @@
+from ..abcs.database_types import TemporalType, ColType_UUID
+from . import presto
+from .base import import_helper
+from .base import TIMESTAMP_PRECISION_POS
+
+
+@import_helper("trino")
+def import_trino():
+ import trino
+
+ return trino
+
+
+Mixin_MD5 = presto.Mixin_MD5
+
+
+class Mixin_NormalizeValue(presto.Mixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ s = f"date_format(cast({value} as timestamp({coltype.precision})), '%Y-%m-%d %H:%i:%S.%f')"
+ else:
+ s = f"date_format(cast({value} as timestamp(6)), '%Y-%m-%d %H:%i:%S.%f')"
+
+ return (
+ f"RPAD(RPAD({s}, {TIMESTAMP_PRECISION_POS + coltype.precision}, '.'), {TIMESTAMP_PRECISION_POS + 6}, '0')"
+ )
+
+ def normalize_uuid(self, value: str, coltype: ColType_UUID) -> str:
+ return f"TRIM({value})"
+
+
+class Dialect(presto.Dialect):
+ name = "Trino"
+
+
+class Trino(presto.Presto):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "trino://@//"
+ CONNECT_URI_PARAMS = ["catalog", "schema"]
+
+ def __init__(self, **kw):
+ trino = import_trino()
+
+ if kw.get("schema"):
+ self.default_schema = kw.get("schema")
+
+ self._conn = trino.dbapi.connect(**kw)
diff --git a/data_diff/sqeleton/databases/vertica.py b/data_diff/sqeleton/databases/vertica.py
new file mode 100644
index 00000000..3f853eae
--- /dev/null
+++ b/data_diff/sqeleton/databases/vertica.py
@@ -0,0 +1,181 @@
+from typing import List
+
+from ..utils import match_regexps
+from .base import (
+ CHECKSUM_HEXDIGITS,
+ MD5_HEXDIGITS,
+ TIMESTAMP_PRECISION_POS,
+ BaseDialect,
+ ConnectError,
+ DbPath,
+ ColType,
+ ThreadedDatabase,
+ import_helper,
+ Mixin_RandomSample,
+)
+from ..abcs.database_types import (
+ Decimal,
+ Float,
+ FractionalType,
+ Integer,
+ TemporalType,
+ Text,
+ Timestamp,
+ TimestampTZ,
+ Boolean,
+ ColType_UUID,
+)
+from ..abcs.mixins import AbstractMixin_MD5, AbstractMixin_NormalizeValue, AbstractMixin_Schema
+from ..abcs import Compilable
+from ..queries import table, this, SKIP
+
+
+@import_helper("vertica")
+def import_vertica():
+ import vertica_python
+
+ return vertica_python
+
+
+class Mixin_MD5(AbstractMixin_MD5):
+ def md5_as_int(self, s: str) -> str:
+ return f"CAST(HEX_TO_INTEGER(SUBSTRING(MD5({s}), {1 + MD5_HEXDIGITS - CHECKSUM_HEXDIGITS})) AS NUMERIC(38, 0))"
+
+
+class Mixin_NormalizeValue(AbstractMixin_NormalizeValue):
+ def normalize_timestamp(self, value: str, coltype: TemporalType) -> str:
+ if coltype.rounds:
+ return f"TO_CHAR({value}::TIMESTAMP({coltype.precision}), 'YYYY-MM-DD HH24:MI:SS.US')"
+
+ timestamp6 = f"TO_CHAR({value}::TIMESTAMP(6), 'YYYY-MM-DD HH24:MI:SS.US')"
+ return (
+ f"RPAD(LEFT({timestamp6}, {TIMESTAMP_PRECISION_POS+coltype.precision}), {TIMESTAMP_PRECISION_POS+6}, '0')"
+ )
+
+ def normalize_number(self, value: str, coltype: FractionalType) -> str:
+ return self.to_string(f"CAST({value} AS NUMERIC(38, {coltype.precision}))")
+
+ def normalize_uuid(self, value: str, _coltype: ColType_UUID) -> str:
+ # Trim doesn't work on CHAR type
+ return f"TRIM(CAST({value} AS VARCHAR))"
+
+ def normalize_boolean(self, value: str, _coltype: Boolean) -> str:
+ return self.to_string(f"cast ({value} as int)")
+
+
+class Mixin_Schema(AbstractMixin_Schema):
+ def table_information(self) -> Compilable:
+ return table("v_catalog", "tables")
+
+ def list_tables(self, table_schema: str, like: Compilable = None) -> Compilable:
+ return (
+ self.table_information()
+ .where(
+ this.table_schema == table_schema,
+ this.table_name.like(like) if like is not None else SKIP,
+ )
+ .select(this.table_name)
+ )
+
+
+class Dialect(BaseDialect, Mixin_Schema):
+ name = "Vertica"
+ ROUNDS_ON_PREC_LOSS = True
+
+ TYPE_CLASSES = {
+ # Timestamps
+ "timestamp": Timestamp,
+ "timestamptz": TimestampTZ,
+ # Numbers
+ "numeric": Decimal,
+ "int": Integer,
+ "float": Float,
+ # Text
+ "char": Text,
+ "varchar": Text,
+ # Boolean
+ "boolean": Boolean,
+ }
+ MIXINS = {Mixin_Schema, Mixin_MD5, Mixin_NormalizeValue, Mixin_RandomSample}
+
+ def quote(self, s: str):
+ return f'"{s}"'
+
+ def concat(self, items: List[str]) -> str:
+ return " || ".join(items)
+
+ def to_string(self, s: str) -> str:
+ return f"CAST({s} AS VARCHAR)"
+
+ def is_distinct_from(self, a: str, b: str) -> str:
+ return f"not ({a} <=> {b})"
+
+ def parse_type(
+ self,
+ table_path: DbPath,
+ col_name: str,
+ type_repr: str,
+ datetime_precision: int = None,
+ numeric_precision: int = None,
+ numeric_scale: int = None,
+ ) -> ColType:
+ timestamp_regexps = {
+ r"timestamp\(?(\d?)\)?": Timestamp,
+ r"timestamptz\(?(\d?)\)?": TimestampTZ,
+ }
+ for m, t_cls in match_regexps(timestamp_regexps, type_repr):
+ precision = int(m.group(1)) if m.group(1) else 6
+ return t_cls(precision=precision, rounds=self.ROUNDS_ON_PREC_LOSS)
+
+ number_regexps = {
+ r"numeric\((\d+),(\d+)\)": Decimal,
+ }
+ for m, n_cls in match_regexps(number_regexps, type_repr):
+ _prec, scale = map(int, m.groups())
+ return n_cls(scale)
+
+ string_regexps = {
+ r"varchar\((\d+)\)": Text,
+ r"char\((\d+)\)": Text,
+ }
+ for m, n_cls in match_regexps(string_regexps, type_repr):
+ return n_cls()
+
+ return super().parse_type(table_path, col_name, type_repr, datetime_precision, numeric_precision)
+
+ def set_timezone_to_utc(self) -> str:
+ return "SET TIME ZONE TO 'UTC'"
+
+ def current_timestamp(self) -> str:
+ return "current_timestamp(6)"
+
+
+class Vertica(ThreadedDatabase):
+ dialect = Dialect()
+ CONNECT_URI_HELP = "vertica://:@/"
+ CONNECT_URI_PARAMS = ["database?"]
+
+ default_schema = "public"
+
+ def __init__(self, *, thread_count, **kw):
+ self._args = kw
+ self._args["AUTOCOMMIT"] = False
+
+ super().__init__(thread_count=thread_count)
+
+ def create_connection(self):
+ vertica = import_vertica()
+ try:
+ c = vertica.connect(**self._args)
+ return c
+ except vertica.errors.ConnectionError as e:
+ raise ConnectError(*e.args) from e
+
+ def select_table_schema(self, path: DbPath) -> str:
+ schema, name = self._normalize_table_path(path)
+
+ return (
+ "SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale "
+ "FROM V_CATALOG.COLUMNS "
+ f"WHERE table_name = '{name}' AND table_schema = '{schema}'"
+ )
diff --git a/data_diff/sqeleton/queries/__init__.py b/data_diff/sqeleton/queries/__init__.py
new file mode 100644
index 00000000..0de76b42
--- /dev/null
+++ b/data_diff/sqeleton/queries/__init__.py
@@ -0,0 +1,25 @@
+from .compiler import Compiler, CompileError
+from .api import (
+ this,
+ join,
+ outerjoin,
+ table,
+ SKIP,
+ sum_,
+ avg,
+ min_,
+ max_,
+ cte,
+ commit,
+ when,
+ coalesce,
+ and_,
+ if_,
+ or_,
+ leftjoin,
+ rightjoin,
+ current_timestamp,
+ code,
+)
+from .ast_classes import Expr, ExprNode, Select, Count, BinOp, Explain, In, Code, Column
+from .extras import Checksum, NormalizeAsString, ApplyFuncAndNormalizeAsString
diff --git a/data_diff/sqeleton/queries/api.py b/data_diff/sqeleton/queries/api.py
new file mode 100644
index 00000000..fc0affb5
--- /dev/null
+++ b/data_diff/sqeleton/queries/api.py
@@ -0,0 +1,202 @@
+from typing import Optional
+
+from ..utils import CaseAwareMapping, CaseSensitiveDict
+from .ast_classes import *
+from .base import args_as_tuple
+
+
+this = This()
+
+
+def join(*tables: ITable) -> Join:
+ """Inner-join a sequence of table expressions"
+
+ When joining, it's recommended to use explicit tables names, instead of `this`, in order to avoid potential name collisions.
+
+ Example:
+ ::
+
+ person = table('person')
+ city = table('city')
+
+ name_and_city = (
+ join(person, city)
+ .on(person['city_id'] == city['id'])
+ .select(person['id'], city['name'])
+ )
+ """
+ return Join(tables)
+
+
+def leftjoin(*tables: ITable):
+ """Left-joins a sequence of table expressions.
+
+ See Also: ``join()``
+ """
+ return Join(tables, "LEFT")
+
+
+def rightjoin(*tables: ITable):
+ """Right-joins a sequence of table expressions.
+
+ See Also: ``join()``
+ """
+ return Join(tables, "RIGHT")
+
+
+def outerjoin(*tables: ITable):
+ """Outer-joins a sequence of table expressions.
+
+ See Also: ``join()``
+ """
+ return Join(tables, "FULL OUTER")
+
+
+def cte(expr: Expr, *, name: Optional[str] = None, params: Sequence[str] = None):
+ """Define a CTE"""
+ return Cte(expr, name, params)
+
+
+def table(*path: str, schema: Union[dict, CaseAwareMapping] = None) -> TablePath:
+ """Defines a table with a path (dotted name), and optionally a schema.
+
+ Parameters:
+ path: A list of names that make up the path to the table.
+ schema: a dictionary of {name: type}
+ """
+ if len(path) == 1 and isinstance(path[0], tuple):
+ (path,) = path
+ if not all(isinstance(i, str) for i in path):
+ raise TypeError(f"All elements of table path must be of type 'str'. Got: {path}")
+ if schema and not isinstance(schema, CaseAwareMapping):
+ assert isinstance(schema, dict)
+ schema = CaseSensitiveDict(schema)
+ return TablePath(path, schema)
+
+
+def or_(*exprs: Expr):
+ """Apply OR between a sequence of boolean expressions"""
+ exprs = args_as_tuple(exprs)
+ if len(exprs) == 1:
+ return exprs[0]
+ return BinBoolOp("OR", exprs)
+
+
+def and_(*exprs: Expr):
+ """Apply AND between a sequence of boolean expressions"""
+ exprs = args_as_tuple(exprs)
+ if len(exprs) == 1:
+ return exprs[0]
+ return BinBoolOp("AND", exprs)
+
+
+def sum_(expr: Expr):
+ """Call SUM(expr)"""
+ return Func("sum", [expr])
+
+
+def avg(expr: Expr):
+ """Call AVG(expr)"""
+ return Func("avg", [expr])
+
+
+def min_(expr: Expr):
+ """Call MIN(expr)"""
+ return Func("min", [expr])
+
+
+def max_(expr: Expr):
+ """Call MAX(expr)"""
+ return Func("max", [expr])
+
+
+def exists(expr: Expr):
+ """Call EXISTS(expr)"""
+ return Func("exists", [expr])
+
+
+def if_(cond: Expr, then: Expr, else_: Optional[Expr] = None):
+ """Conditional expression, shortcut to when-then-else.
+
+ Example:
+ ::
+
+ # SELECT CASE WHEN b THEN c ELSE d END FROM foo
+ table('foo').select(if_(b, c, d))
+ """
+ return when(cond).then(then).else_(else_)
+
+
+def when(*when_exprs: Expr):
+ """Start a when-then expression
+
+ Example:
+ ::
+
+ # SELECT CASE
+ # WHEN (type = 'text') THEN text
+ # WHEN (type = 'number') THEN number
+ # ELSE 'unknown type' END
+ # FROM foo
+ rows = table('foo').select(
+ when(this.type == 'text').then(this.text)
+ .when(this.type == 'number').then(this.number)
+ .else_('unknown type')
+ )
+ """
+ return CaseWhen([]).when(*when_exprs)
+
+
+def coalesce(*exprs):
+ "Returns a call to COALESCE"
+ exprs = args_as_tuple(exprs)
+ return Func("COALESCE", exprs)
+
+
+def insert_rows_in_batches(db, tbl: TablePath, rows, *, columns=None, batch_size=1024 * 8):
+ assert batch_size > 0
+ rows = list(rows)
+
+ while rows:
+ batch, rows = rows[:batch_size], rows[batch_size:]
+ db.query(tbl.insert_rows(batch, columns=columns))
+
+
+def current_timestamp():
+ """Returns CURRENT_TIMESTAMP() or NOW()"""
+ return CurrentTimestamp()
+
+
+def code(code: str, **kw: Dict[str, Expr]) -> Code:
+ """Inline raw SQL code.
+
+ It allows users to use features and syntax that Sqeleton doesn't yet support.
+
+ It's the user's responsibility to make sure the contents of the string given to `code()` are correct and safe for execution.
+
+ Strings given to `code()` are actually templates, and can embed query expressions given as arguments:
+
+ Parameters:
+ code: template string of SQL code. Templated variables are signified with '{var}'.
+ kw: optional parameters for SQL template.
+
+ Examples:
+ ::
+
+ # SELECT b, FROM tmp WHERE
+ table('tmp').select(this.b, code("")).where(code(""))
+
+ ::
+
+ def tablesample(tbl, size):
+ return code("SELECT * FROM {tbl} TABLESAMPLE BERNOULLI ({size})", tbl=tbl, size=size)
+
+ nonzero = table('points').where(this.x > 0, this.y > 0)
+
+ # SELECT * FROM points WHERE (x > 0) AND (y > 0) TABLESAMPLE BERNOULLI (10)
+ sample_expr = tablesample(nonzero)
+ """
+ return Code(code, kw)
+
+
+commit = Commit()
diff --git a/data_diff/sqeleton/queries/ast_classes.py b/data_diff/sqeleton/queries/ast_classes.py
new file mode 100644
index 00000000..7975c8fa
--- /dev/null
+++ b/data_diff/sqeleton/queries/ast_classes.py
@@ -0,0 +1,1030 @@
+from dataclasses import field
+from datetime import datetime
+from typing import Any, Generator, List, Optional, Sequence, Union, Dict
+
+from runtype import dataclass
+
+from ..utils import join_iter, ArithString
+from ..abcs import Compilable
+from ..abcs.database_types import AbstractTable
+from ..abcs.mixins import AbstractMixin_Regex, AbstractMixin_TimeTravel
+from ..schema import Schema
+
+from .compiler import Compiler, cv_params, Root, CompileError
+from .base import SKIP, DbPath, args_as_tuple, SqeletonError
+
+
+class QueryBuilderError(SqeletonError):
+ pass
+
+
+class QB_TypeError(QueryBuilderError):
+ pass
+
+
+class ExprNode(Compilable):
+ "Base class for query expression nodes"
+
+ type: Any = None
+
+ def _dfs_values(self):
+ yield self
+ for k, vs in dict(self).items(): # __dict__ provided by runtype.dataclass
+ if k == "source_table":
+ # Skip data-sources, we're only interested in data-parameters
+ continue
+ if not isinstance(vs, (list, tuple)):
+ vs = [vs]
+ for v in vs:
+ if isinstance(v, ExprNode):
+ yield from v._dfs_values()
+
+ def cast_to(self, to):
+ return Cast(self, to)
+
+
+# Query expressions can only interact with objects that are an instance of 'Expr'
+Expr = Union[ExprNode, str, bool, int, float, datetime, ArithString, None]
+
+
+@dataclass
+class Code(ExprNode, Root):
+ code: str
+ args: Dict[str, Expr] = None
+
+ def compile(self, c: Compiler) -> str:
+ if not self.args:
+ return self.code
+
+ args = {k: c.compile(v) for k, v in self.args.items()}
+ return self.code.format(**args)
+
+
+def _expr_type(e: Expr) -> type:
+ if isinstance(e, ExprNode):
+ return e.type
+ return type(e)
+
+
+@dataclass
+class Alias(ExprNode):
+ expr: Expr
+ name: str
+
+ def compile(self, c: Compiler) -> str:
+ return f"{c.compile(self.expr)} AS {c.quote(self.name)}"
+
+ @property
+ def type(self):
+ return _expr_type(self.expr)
+
+
+def _drop_skips(exprs):
+ return [e for e in exprs if e is not SKIP]
+
+
+def _drop_skips_dict(exprs_dict):
+ return {k: v for k, v in exprs_dict.items() if v is not SKIP}
+
+
+class ITable(AbstractTable):
+ source_table: Any
+ schema: Schema = None
+
+ def select(self, *exprs, distinct=SKIP, optimizer_hints=SKIP, **named_exprs) -> "ITable":
+ """Create a new table with the specified fields"""
+ exprs = args_as_tuple(exprs)
+ exprs = _drop_skips(exprs)
+ named_exprs = _drop_skips_dict(named_exprs)
+ exprs += _named_exprs_as_aliases(named_exprs)
+ resolve_names(self.source_table, exprs)
+ return Select.make(self, columns=exprs, distinct=distinct, optimizer_hints=optimizer_hints)
+
+ def where(self, *exprs):
+ exprs = args_as_tuple(exprs)
+ exprs = _drop_skips(exprs)
+ if not exprs:
+ return self
+
+ resolve_names(self.source_table, exprs)
+ return Select.make(self, where_exprs=exprs)
+
+ def order_by(self, *exprs):
+ exprs = _drop_skips(exprs)
+ if not exprs:
+ return self
+
+ resolve_names(self.source_table, exprs)
+ return Select.make(self, order_by_exprs=exprs)
+
+ def limit(self, limit: int):
+ if limit is SKIP:
+ return self
+
+ return Select.make(self, limit_expr=limit)
+
+ def join(self, target: "ITable"):
+ """Join this table with the target table."""
+ return Join([self, target])
+
+ def group_by(self, *keys) -> "GroupBy":
+ """Group according to the given keys.
+
+ Must be followed by a call to :ref:``GroupBy.agg()``
+ """
+ keys = _drop_skips(keys)
+ resolve_names(self.source_table, keys)
+
+ return GroupBy(self, keys)
+
+ def _get_column(self, name: str):
+ if self.schema:
+ name = self.schema.get_key(name) # Get the actual name. Might be case-insensitive.
+ return Column(self, name)
+
+ # def __getattr__(self, column):
+ # return self._get_column(column)
+
+ def __getitem__(self, column):
+ if not isinstance(column, str):
+ raise TypeError()
+ return self._get_column(column)
+
+ def count(self):
+ return Select(self, [Count()])
+
+ def union(self, other: "ITable"):
+ """SELECT * FROM self UNION other"""
+ return TableOp("UNION", self, other)
+
+ def union_all(self, other: "ITable"):
+ """SELECT * FROM self UNION ALL other"""
+ return TableOp("UNION ALL", self, other)
+
+ def minus(self, other: "ITable"):
+ """SELECT * FROM self EXCEPT other"""
+ # aka
+ return TableOp("EXCEPT", self, other)
+
+ def intersect(self, other: "ITable"):
+ """SELECT * FROM self INTERSECT other"""
+ return TableOp("INTERSECT", self, other)
+
+
+@dataclass
+class Concat(ExprNode):
+ exprs: list
+ sep: str = None
+
+ def compile(self, c: Compiler) -> str:
+ # We coalesce because on some DBs (e.g. MySQL) concat('a', NULL) is NULL
+ items = [f"coalesce({c.compile(Code(c.dialect.to_string(c.compile(expr))))}, '')" for expr in self.exprs]
+ assert items
+ if len(items) == 1:
+ return items[0]
+
+ if self.sep:
+ items = list(join_iter(f"'{self.sep}'", items))
+ return c.dialect.concat(items)
+
+
+@dataclass
+class Count(ExprNode):
+ expr: Expr = None
+ distinct: bool = False
+
+ type = int
+
+ def compile(self, c: Compiler) -> str:
+ expr = c.compile(self.expr) if self.expr else "*"
+ if self.distinct:
+ return f"count(distinct {expr})"
+
+ return f"count({expr})"
+
+
+class LazyOps:
+ def __add__(self, other):
+ return BinOp("+", [self, other])
+
+ def __sub__(self, other):
+ return BinOp("-", [self, other])
+
+ def __neg__(self):
+ return UnaryOp("-", self)
+
+ def __gt__(self, other):
+ return BinBoolOp(">", [self, other])
+
+ def __ge__(self, other):
+ return BinBoolOp(">=", [self, other])
+
+ def __eq__(self, other):
+ if other is None:
+ return BinBoolOp("IS", [self, None])
+ return BinBoolOp("=", [self, other])
+
+ def __lt__(self, other):
+ return BinBoolOp("<", [self, other])
+
+ def __le__(self, other):
+ return BinBoolOp("<=", [self, other])
+
+ def __or__(self, other):
+ return BinBoolOp("OR", [self, other])
+
+ def __and__(self, other):
+ return BinBoolOp("AND", [self, other])
+
+ def is_distinct_from(self, other):
+ return IsDistinctFrom(self, other)
+
+ def like(self, other):
+ return BinBoolOp("LIKE", [self, other])
+
+ def test_regex(self, other):
+ return TestRegex(self, other)
+
+ def sum(self):
+ return Func("SUM", [self])
+
+ def max(self):
+ return Func("MAX", [self])
+
+ def min(self):
+ return Func("MIN", [self])
+
+
+@dataclass
+class TestRegex(ExprNode, LazyOps):
+ string: Expr
+ pattern: Expr
+
+ def compile(self, c: Compiler) -> str:
+ if not isinstance(c.dialect, AbstractMixin_Regex):
+ raise NotImplementedError(f"No regex implementation for database '{c.database}'")
+ regex = c.dialect.test_regex(self.string, self.pattern)
+ return c.compile(regex)
+
+
+@dataclass(eq=False)
+class Func(ExprNode, LazyOps):
+ name: str
+ args: Sequence[Expr]
+
+ def compile(self, c: Compiler) -> str:
+ args = ", ".join(c.compile(e) for e in self.args)
+ return f"{self.name}({args})"
+
+
+@dataclass
+class WhenThen(ExprNode):
+ when: Expr
+ then: Expr
+
+ def compile(self, c: Compiler) -> str:
+ return f"WHEN {c.compile(self.when)} THEN {c.compile(self.then)}"
+
+
+@dataclass
+class CaseWhen(ExprNode):
+ cases: Sequence[WhenThen]
+ else_expr: Expr = None
+
+ def compile(self, c: Compiler) -> str:
+ assert self.cases
+ when_thens = " ".join(c.compile(case) for case in self.cases)
+ else_expr = (" ELSE " + c.compile(self.else_expr)) if self.else_expr is not None else ""
+ return f"CASE {when_thens}{else_expr} END"
+
+ @property
+ def type(self):
+ then_types = {_expr_type(case.then) for case in self.cases}
+ if self.else_expr:
+ then_types |= _expr_type(self.else_expr)
+ if len(then_types) > 1:
+ raise QB_TypeError(f"Non-matching types in when: {then_types}")
+ (t,) = then_types
+ return t
+
+ def when(self, *whens: Expr) -> "QB_When":
+ """Add a new 'when' clause to the case expression
+
+ Must be followed by a call to `.then()`
+ """
+ whens = args_as_tuple(whens)
+ whens = _drop_skips(whens)
+ if not whens:
+ raise QueryBuilderError("Expected valid whens")
+
+ # XXX reimplementing api.and_()
+ if len(whens) == 1:
+ return QB_When(self, whens[0])
+ return QB_When(self, BinBoolOp("AND", whens))
+
+ def else_(self, then: Expr):
+ """Add an 'else' clause to the case expression.
+
+ Can only be called once!
+ """
+ if self.else_expr is not None:
+ raise QueryBuilderError(f"Else clause already specified in {self}")
+
+ return self.replace(else_expr=then)
+
+
+@dataclass
+class QB_When:
+ "Partial case-when, used for query-building"
+ casewhen: CaseWhen
+ when: Expr
+
+ def then(self, then: Expr) -> CaseWhen:
+ """Add a 'then' clause after a 'when' was added."""
+ case = WhenThen(self.when, then)
+ return self.casewhen.replace(cases=self.casewhen.cases + [case])
+
+
+@dataclass(eq=False, order=False)
+class IsDistinctFrom(ExprNode, LazyOps):
+ a: Expr
+ b: Expr
+ type = bool
+
+ def compile(self, c: Compiler) -> str:
+ a = c.dialect.to_comparable(c.compile(self.a), self.a.type)
+ b = c.dialect.to_comparable(c.compile(self.b), self.b.type)
+ return c.dialect.is_distinct_from(a, b)
+
+
+@dataclass(eq=False, order=False)
+class BinOp(ExprNode, LazyOps):
+ op: str
+ args: Sequence[Expr]
+
+ def compile(self, c: Compiler) -> str:
+ expr = f" {self.op} ".join(c.compile(a) for a in self.args)
+ return f"({expr})"
+
+ @property
+ def type(self):
+ types = {_expr_type(i) for i in self.args}
+ if len(types) > 1:
+ raise TypeError(f"Expected all args to have the same type, got {types}")
+ (t,) = types
+ return t
+
+
+@dataclass
+class UnaryOp(ExprNode, LazyOps):
+ op: str
+ expr: Expr
+
+ def compile(self, c: Compiler) -> str:
+ return f"({self.op}{c.compile(self.expr)})"
+
+
+class BinBoolOp(BinOp):
+ type = bool
+
+
+@dataclass(eq=False, order=False)
+class Column(ExprNode, LazyOps):
+ source_table: ITable
+ name: str
+
+ @property
+ def type(self):
+ if self.source_table.schema is None:
+ raise QueryBuilderError(f"Schema required for table {self.source_table}")
+ return self.source_table.schema[self.name]
+
+ def compile(self, c: Compiler) -> str:
+ if c._table_context:
+ if len(c._table_context) > 1:
+ aliases = [
+ t for t in c._table_context if isinstance(t, TableAlias) and t.source_table is self.source_table
+ ]
+ if not aliases:
+ return c.quote(self.name)
+ elif len(aliases) > 1:
+ raise CompileError(f"Too many aliases for column {self.name}")
+ (alias,) = aliases
+
+ return f"{c.quote(alias.name)}.{c.quote(self.name)}"
+
+ return c.quote(self.name)
+
+
+@dataclass
+class TablePath(ExprNode, ITable):
+ path: DbPath
+ schema: Optional[Schema] = field(default=None, repr=False)
+
+ @property
+ def source_table(self):
+ return self
+
+ def compile(self, c: Compiler) -> str:
+ path = self.path # c.database._normalize_table_path(self.name)
+ return ".".join(map(c.quote, path))
+
+ # Statement shorthands
+ def create(self, source_table: ITable = None, *, if_not_exists: bool = False, primary_keys: List[str] = None):
+ """Returns a query expression to create a new table.
+
+ Parameters:
+ source_table: a table expression to use for initializing the table.
+ If not provided, the table must have a schema specified.
+ if_not_exists: Add a 'if not exists' clause or not. (note: not all dbs support it!)
+ primary_keys: List of column names which define the primary key
+ """
+
+ if source_table is None and not self.schema:
+ raise ValueError("Either schema or source table needed to create table")
+ if isinstance(source_table, TablePath):
+ source_table = source_table.select()
+ return CreateTable(self, source_table, if_not_exists=if_not_exists, primary_keys=primary_keys)
+
+ def drop(self, if_exists=False):
+ """Returns a query expression to delete the table.
+
+ Parameters:
+ if_not_exists: Add a 'if not exists' clause or not. (note: not all dbs support it!)
+ """
+ return DropTable(self, if_exists=if_exists)
+
+ def truncate(self):
+ """Returns a query expression to truncate the table. (remove all rows)"""
+ return TruncateTable(self)
+
+ def insert_rows(self, rows: Sequence, *, columns: List[str] = None):
+ """Returns a query expression to insert rows to the table, given as Python values.
+
+ Parameters:
+ rows: A list of tuples. Must all have the same width.
+ columns: Names of columns being populated. If specified, must have the same length as the tuples.
+ """
+ rows = list(rows)
+ return InsertToTable(self, ConstantTable(rows), columns=columns)
+
+ def insert_row(self, *values, columns: List[str] = None):
+ """Returns a query expression to insert a single row to the table, given as Python values.
+
+ Parameters:
+ columns: Names of columns being populated. If specified, must have the same length as 'values'
+ """
+ return InsertToTable(self, ConstantTable([values]), columns=columns)
+
+ def insert_expr(self, expr: Expr):
+ """Returns a query expression to insert rows to the table, given as a query expression.
+
+ Parameters:
+ expr: query expression to from which to read the rows
+ """
+ if isinstance(expr, TablePath):
+ expr = expr.select()
+ return InsertToTable(self, expr)
+
+ def time_travel(
+ self, *, before: bool = False, timestamp: datetime = None, offset: int = None, statement: str = None
+ ) -> Compilable:
+ """Selects historical data from the table
+
+ Parameters:
+ before: If false, inclusive of the specified point in time.
+ If True, only return the time before it. (at/before)
+ timestamp: A constant timestamp
+ offset: the time 'offset' seconds before now
+ statement: identifier for statement, e.g. query ID
+
+ Must specify exactly one of `timestamp`, `offset` or `statement`.
+ """
+ if sum(int(i is not None) for i in (timestamp, offset, statement)) != 1:
+ raise ValueError("Must specify exactly one of `timestamp`, `offset` or `statement`.")
+
+ if timestamp is not None:
+ assert offset is None and statement is None
+
+
+@dataclass
+class TableAlias(ExprNode, ITable):
+ source_table: ITable
+ name: str
+
+ def compile(self, c: Compiler) -> str:
+ return f"{c.compile(self.source_table)} {c.quote(self.name)}"
+
+
+@dataclass
+class Join(ExprNode, ITable, Root):
+ source_tables: Sequence[ITable]
+ op: str = None
+ on_exprs: Sequence[Expr] = None
+ columns: Sequence[Expr] = None
+
+ @property
+ def source_table(self):
+ return self
+
+ @property
+ def schema(self):
+ assert self.columns # TODO Implement SELECT *
+ s = self.source_tables[0].schema # TODO validate types match between both tables
+ return type(s)({c.name: c.type for c in self.columns})
+
+ def on(self, *exprs) -> "Join":
+ """Add an ON clause, for filtering the result of the cartesian product (i.e. the JOIN)"""
+ if len(exprs) == 1:
+ (e,) = exprs
+ if isinstance(e, Generator):
+ exprs = tuple(e)
+
+ exprs = _drop_skips(exprs)
+ if not exprs:
+ return self
+
+ return self.replace(on_exprs=(self.on_exprs or []) + exprs)
+
+ def select(self, *exprs, **named_exprs) -> ITable:
+ """Select fields to return from the JOIN operation
+
+ See Also: ``ITable.select()``
+ """
+ if self.columns is not None:
+ # join-select already applied
+ return super().select(*exprs, **named_exprs)
+
+ exprs = _drop_skips(exprs)
+ named_exprs = _drop_skips_dict(named_exprs)
+ exprs += _named_exprs_as_aliases(named_exprs)
+ resolve_names(self.source_table, exprs)
+ # TODO Ensure exprs <= self.columns ?
+ return self.replace(columns=exprs)
+
+ def compile(self, parent_c: Compiler) -> str:
+ tables = [
+ t if isinstance(t, TableAlias) else TableAlias(t, parent_c.new_unique_name()) for t in self.source_tables
+ ]
+ c = parent_c.add_table_context(*tables, in_join=True, in_select=False)
+ op = " JOIN " if self.op is None else f" {self.op} JOIN "
+ joined = op.join(c.compile(t) for t in tables)
+
+ if self.on_exprs:
+ on = " AND ".join(c.compile(e) for e in self.on_exprs)
+ res = f"{joined} ON {on}"
+ else:
+ res = joined
+
+ columns = "*" if self.columns is None else ", ".join(map(c.compile, self.columns))
+ select = f"SELECT {columns} FROM {res}"
+
+ if parent_c.in_select:
+ select = f"({select}) {c.new_unique_name()}"
+ elif parent_c.in_join:
+ select = f"({select})"
+ return select
+
+
+@dataclass
+class GroupBy(ExprNode, ITable, Root):
+ table: ITable
+ keys: Sequence[Expr] = None # IKey?
+ values: Sequence[Expr] = None
+ having_exprs: Sequence[Expr] = None
+
+ @property
+ def source_table(self):
+ return self
+
+ def __post_init__(self):
+ assert self.keys or self.values
+
+ def having(self, *exprs):
+ """Add a 'HAVING' clause to the group-by"""
+ exprs = args_as_tuple(exprs)
+ exprs = _drop_skips(exprs)
+ if not exprs:
+ return self
+
+ resolve_names(self.table, exprs)
+ return self.replace(having_exprs=(self.having_exprs or []) + exprs)
+
+ def agg(self, *exprs):
+ """Select aggregated fields for the group-by."""
+ exprs = args_as_tuple(exprs)
+ exprs = _drop_skips(exprs)
+ resolve_names(self.table, exprs)
+ return self.replace(values=(self.values or []) + exprs)
+
+ def compile(self, c: Compiler) -> str:
+ if self.values is None:
+ raise CompileError(".group_by() must be followed by a call to .agg()")
+
+ keys = [str(i + 1) for i in range(len(self.keys))]
+ columns = (self.keys or []) + (self.values or [])
+ if isinstance(self.table, Select) and self.table.columns is None and self.table.group_by_exprs is None:
+ return c.compile(
+ self.table.replace(
+ columns=columns,
+ group_by_exprs=[Code(k) for k in keys],
+ having_exprs=self.having_exprs,
+ )
+ )
+
+ keys_str = ", ".join(keys)
+ columns_str = ", ".join(c.compile(x) for x in columns)
+ having_str = (
+ " HAVING " + " AND ".join(map(c.compile, self.having_exprs)) if self.having_exprs is not None else ""
+ )
+ select = (
+ f"SELECT {columns_str} FROM {c.replace(in_select=True).compile(self.table)} GROUP BY {keys_str}{having_str}"
+ )
+
+ if c.in_select:
+ select = f"({select}) {c.new_unique_name()}"
+ elif c.in_join:
+ select = f"({select})"
+ return select
+
+
+@dataclass
+class TableOp(ExprNode, ITable, Root):
+ op: str
+ table1: ITable
+ table2: ITable
+
+ @property
+ def source_table(self):
+ return self
+
+ @property
+ def type(self):
+ # TODO ensure types of both tables are compatible
+ return self.table1.type
+
+ @property
+ def schema(self):
+ s1 = self.table1.schema
+ s2 = self.table2.schema
+ assert len(s1) == len(s2)
+ return s1
+
+ def compile(self, parent_c: Compiler) -> str:
+ c = parent_c.replace(in_select=False)
+ table_expr = f"{c.compile(self.table1)} {self.op} {c.compile(self.table2)}"
+ if parent_c.in_select:
+ table_expr = f"({table_expr}) {c.new_unique_name()}"
+ elif parent_c.in_join:
+ table_expr = f"({table_expr})"
+ return table_expr
+
+
+@dataclass
+class Select(ExprNode, ITable, Root):
+ table: Expr = None
+ columns: Sequence[Expr] = None
+ where_exprs: Sequence[Expr] = None
+ order_by_exprs: Sequence[Expr] = None
+ group_by_exprs: Sequence[Expr] = None
+ having_exprs: Sequence[Expr] = None
+ limit_expr: int = None
+ distinct: bool = False
+ optimizer_hints: Sequence[Expr] = None
+
+ @property
+ def schema(self):
+ s = self.table.schema
+ if s is None or self.columns is None:
+ return s
+ return type(s)({c.name: c.type for c in self.columns})
+
+ @property
+ def source_table(self):
+ return self
+
+ def compile(self, parent_c: Compiler) -> str:
+ c = parent_c.replace(in_select=True) # .add_table_context(self.table)
+
+ columns = ", ".join(map(c.compile, self.columns)) if self.columns else "*"
+ distinct = "DISTINCT " if self.distinct else ""
+ optimizer_hints = c.dialect.optimizer_hints(self.optimizer_hints) if self.optimizer_hints else ""
+ select = f"SELECT {optimizer_hints}{distinct}{columns}"
+
+ if self.table:
+ select += " FROM " + c.compile(self.table)
+ elif c.dialect.PLACEHOLDER_TABLE:
+ select += f" FROM {c.dialect.PLACEHOLDER_TABLE}"
+
+ if self.where_exprs:
+ select += " WHERE " + " AND ".join(map(c.compile, self.where_exprs))
+
+ if self.group_by_exprs:
+ select += " GROUP BY " + ", ".join(map(c.compile, self.group_by_exprs))
+
+ if self.having_exprs:
+ assert self.group_by_exprs
+ select += " HAVING " + " AND ".join(map(c.compile, self.having_exprs))
+
+ if self.order_by_exprs:
+ select += " ORDER BY " + ", ".join(map(c.compile, self.order_by_exprs))
+
+ if self.limit_expr is not None:
+ select += " " + c.dialect.offset_limit(0, self.limit_expr)
+
+ if parent_c.in_select:
+ select = f"({select}) {c.new_unique_name()}"
+ elif parent_c.in_join:
+ select = f"({select})"
+ return select
+
+ @classmethod
+ def make(cls, table: ITable, distinct: bool = SKIP, optimizer_hints: str = SKIP, **kwargs):
+ assert "table" not in kwargs
+
+ if not isinstance(table, cls): # If not Select
+ if distinct is not SKIP:
+ kwargs["distinct"] = distinct
+ if optimizer_hints is not SKIP:
+ kwargs["optimizer_hints"] = optimizer_hints
+ return cls(table, **kwargs)
+
+ # We can safely assume isinstance(table, Select)
+ if optimizer_hints is not SKIP:
+ kwargs["optimizer_hints"] = optimizer_hints
+
+ if distinct is not SKIP:
+ if distinct == False and table.distinct:
+ return cls(table, **kwargs)
+ kwargs["distinct"] = distinct
+
+ if table.limit_expr or table.group_by_exprs:
+ return cls(table, **kwargs)
+
+ # Fill in missing attributes, instead of nesting instances
+ for k, v in kwargs.items():
+ if getattr(table, k) is not None:
+ if k == "where_exprs": # Additive attribute
+ kwargs[k] = getattr(table, k) + v
+ elif k in ["distinct", "optimizer_hints"]:
+ pass
+ else:
+ raise ValueError(k)
+
+ return table.replace(**kwargs)
+
+
+@dataclass
+class Cte(ExprNode, ITable):
+ source_table: Expr
+ name: str = None
+ params: Sequence[str] = None
+
+ def compile(self, parent_c: Compiler) -> str:
+ c = parent_c.replace(_table_context=[], in_select=False)
+ compiled = c.compile(self.source_table)
+
+ name = self.name or parent_c.new_unique_name()
+ name_params = f"{name}({', '.join(self.params)})" if self.params else name
+ parent_c._subqueries[name_params] = compiled
+
+ return name
+
+ @property
+ def schema(self):
+ # TODO add cte to schema
+ return self.source_table.schema
+
+
+def _named_exprs_as_aliases(named_exprs):
+ return [Alias(expr, name) for name, expr in named_exprs.items()]
+
+
+def resolve_names(source_table, exprs):
+ i = 0
+ for expr in exprs:
+ # Iterate recursively and update _ResolveColumn instances with the right expression
+ if isinstance(expr, ExprNode):
+ for v in expr._dfs_values():
+ if isinstance(v, _ResolveColumn):
+ v.resolve(source_table._get_column(v.resolve_name))
+ i += 1
+
+
+@dataclass(frozen=False, eq=False, order=False)
+class _ResolveColumn(ExprNode, LazyOps):
+ resolve_name: str
+ resolved: Expr = None
+
+ def resolve(self, expr: Expr):
+ if self.resolved is not None:
+ raise QueryBuilderError("Already resolved!")
+ self.resolved = expr
+
+ def _get_resolved(self) -> Expr:
+ if self.resolved is None:
+ raise QueryBuilderError(f"Column not resolved: {self.resolve_name}")
+ return self.resolved
+
+ def compile(self, c: Compiler) -> str:
+ return self._get_resolved().compile(c)
+
+ @property
+ def type(self):
+ return self._get_resolved().type
+
+ @property
+ def name(self):
+ return self._get_resolved().name
+
+
+class This:
+ """Builder object for accessing table attributes.
+
+ Automatically evaluates to the the 'top-most' table during compilation.
+ """
+
+ def __getattr__(self, name):
+ return _ResolveColumn(name)
+
+ def __getitem__(self, name):
+ if isinstance(name, (list, tuple)):
+ return [_ResolveColumn(n) for n in name]
+ return _ResolveColumn(name)
+
+
+@dataclass
+class In(ExprNode):
+ expr: Expr
+ list: Sequence[Expr]
+
+ type = bool
+
+ def compile(self, c: Compiler):
+ elems = ", ".join(map(c.compile, self.list))
+ return f"({c.compile(self.expr)} IN ({elems}))"
+
+
+@dataclass
+class Cast(ExprNode):
+ expr: Expr
+ target_type: Expr
+
+ def compile(self, c: Compiler) -> str:
+ return f"cast({c.compile(self.expr)} as {c.compile(self.target_type)})"
+
+
+@dataclass
+class Random(ExprNode, LazyOps):
+ type = float
+
+ def compile(self, c: Compiler) -> str:
+ return c.dialect.random()
+
+
+@dataclass
+class ConstantTable(ExprNode):
+ rows: Sequence[Sequence]
+
+ def compile(self, c: Compiler) -> str:
+ raise NotImplementedError()
+
+ def compile_for_insert(self, c: Compiler):
+ return c.dialect.constant_values(self.rows)
+
+
+@dataclass
+class Explain(ExprNode, Root):
+ select: Select
+
+ type = str
+
+ def compile(self, c: Compiler) -> str:
+ return c.dialect.explain_as_text(c.compile(self.select))
+
+
+class CurrentTimestamp(ExprNode):
+ type = datetime
+
+ def compile(self, c: Compiler) -> str:
+ return c.dialect.current_timestamp()
+
+
+@dataclass
+class TimeTravel(ITable):
+ table: TablePath
+ before: bool = False
+ timestamp: datetime = None
+ offset: int = None
+ statement: str = None
+
+ def compile(self, c: Compiler) -> str:
+ assert isinstance(c, AbstractMixin_TimeTravel)
+ return c.compile(
+ c.time_travel(
+ self.table, before=self.before, timestamp=self.timestamp, offset=self.offset, statement=self.statement
+ )
+ )
+
+
+# DDL
+
+
+class Statement(Compilable, Root):
+ type = None
+
+
+@dataclass
+class CreateTable(Statement):
+ path: TablePath
+ source_table: Expr = None
+ if_not_exists: bool = False
+ primary_keys: List[str] = None
+
+ def compile(self, c: Compiler) -> str:
+ ne = "IF NOT EXISTS " if self.if_not_exists else ""
+ if self.source_table:
+ return f"CREATE TABLE {ne}{c.compile(self.path)} AS {c.compile(self.source_table)}"
+
+ schema = ", ".join(f"{c.dialect.quote(k)} {c.dialect.type_repr(v)}" for k, v in self.path.schema.items())
+ pks = (
+ ", PRIMARY KEY (%s)" % ", ".join(self.primary_keys)
+ if self.primary_keys and c.dialect.SUPPORTS_PRIMARY_KEY
+ else ""
+ )
+ return f"CREATE TABLE {ne}{c.compile(self.path)}({schema}{pks})"
+
+
+@dataclass
+class DropTable(Statement):
+ path: TablePath
+ if_exists: bool = False
+
+ def compile(self, c: Compiler) -> str:
+ ie = "IF EXISTS " if self.if_exists else ""
+ return f"DROP TABLE {ie}{c.compile(self.path)}"
+
+
+@dataclass
+class TruncateTable(Statement):
+ path: TablePath
+
+ def compile(self, c: Compiler) -> str:
+ return f"TRUNCATE TABLE {c.compile(self.path)}"
+
+
+@dataclass
+class InsertToTable(Statement):
+ path: TablePath
+ expr: Expr
+ columns: List[str] = None
+ returning_exprs: List[str] = None
+
+ def compile(self, c: Compiler) -> str:
+ if isinstance(self.expr, ConstantTable):
+ expr = self.expr.compile_for_insert(c)
+ else:
+ expr = c.compile(self.expr)
+
+ columns = "(%s)" % ", ".join(map(c.quote, self.columns)) if self.columns is not None else ""
+
+ return f"INSERT INTO {c.compile(self.path)}{columns} {expr}"
+
+ def returning(self, *exprs):
+ """Add a 'RETURNING' clause to the insert expression.
+
+ Note: Not all databases support this feature!
+ """
+ if self.returning_exprs:
+ raise ValueError("A returning clause is already specified")
+
+ exprs = args_as_tuple(exprs)
+ exprs = _drop_skips(exprs)
+ if not exprs:
+ return self
+
+ resolve_names(self.path, exprs)
+ return self.replace(returning_exprs=exprs)
+
+
+@dataclass
+class Commit(Statement):
+ """Generate a COMMIT statement, if we're in the middle of a transaction, or in auto-commit. Otherwise SKIP."""
+
+ def compile(self, c: Compiler) -> str:
+ return "COMMIT" if not c.database.is_autocommit else SKIP
+
+
+@dataclass
+class Param(ExprNode, ITable):
+ """A value placeholder, to be specified at compilation time using the `cv_params` context variable."""
+
+ name: str
+
+ @property
+ def source_table(self):
+ return self
+
+ def compile(self, c: Compiler) -> str:
+ params = cv_params.get()
+ return c._compile(params[self.name])
diff --git a/data_diff/sqeleton/queries/base.py b/data_diff/sqeleton/queries/base.py
new file mode 100644
index 00000000..cac8804e
--- /dev/null
+++ b/data_diff/sqeleton/queries/base.py
@@ -0,0 +1,24 @@
+from typing import Generator
+
+from ..abcs import DbPath, DbKey
+from ..schema import Schema
+
+
+class _SKIP:
+ def __repr__(self):
+ return "SKIP"
+
+
+SKIP = _SKIP()
+
+
+class SqeletonError(Exception):
+ pass
+
+
+def args_as_tuple(exprs):
+ if len(exprs) == 1:
+ (e,) = exprs
+ if isinstance(e, Generator):
+ return tuple(e)
+ return exprs
diff --git a/data_diff/sqeleton/queries/compiler.py b/data_diff/sqeleton/queries/compiler.py
new file mode 100644
index 00000000..ebd39e93
--- /dev/null
+++ b/data_diff/sqeleton/queries/compiler.py
@@ -0,0 +1,85 @@
+import random
+from datetime import datetime
+from typing import Any, Dict, Sequence, List
+
+from runtype import dataclass
+
+from ..utils import ArithString
+from ..abcs import AbstractDatabase, AbstractDialect, DbPath, AbstractCompiler, Compilable
+
+import contextvars
+
+cv_params = contextvars.ContextVar("params")
+
+
+class CompileError(Exception):
+ pass
+
+
+class Root:
+ "Nodes inheriting from Root can be used as root statements in SQL (e.g. SELECT yes, RANDOM() no)"
+
+
+@dataclass
+class Compiler(AbstractCompiler):
+ database: AbstractDatabase
+ params: dict = {}
+ in_select: bool = False # Compilation runtime flag
+ in_join: bool = False # Compilation runtime flag
+
+ _table_context: List = [] # List[ITable]
+ _subqueries: Dict[str, Any] = {} # XXX not thread-safe
+ root: bool = True
+
+ _counter: List = [0]
+
+ @property
+ def dialect(self) -> AbstractDialect:
+ return self.database.dialect
+
+ def compile(self, elem, params=None) -> str:
+ if params:
+ cv_params.set(params)
+
+ if self.root and isinstance(elem, Compilable) and not isinstance(elem, Root):
+ from .ast_classes import Select
+
+ elem = Select(columns=[elem])
+
+ res = self._compile(elem)
+ if self.root and self._subqueries:
+ subq = ", ".join(f"\n {k} AS ({v})" for k, v in self._subqueries.items())
+ self._subqueries.clear()
+ return f"WITH {subq}\n{res}"
+ return res
+
+ def _compile(self, elem) -> str:
+ if elem is None:
+ return "NULL"
+ elif isinstance(elem, Compilable):
+ return elem.compile(self.replace(root=False))
+ elif isinstance(elem, str):
+ return f"'{elem}'"
+ elif isinstance(elem, (int, float)):
+ return str(elem)
+ elif isinstance(elem, datetime):
+ return self.dialect.timestamp_value(elem)
+ elif isinstance(elem, bytes):
+ return f"b'{elem.decode()}'"
+ elif isinstance(elem, ArithString):
+ return f"'{elem}'"
+ assert False, elem
+
+ def new_unique_name(self, prefix="tmp"):
+ self._counter[0] += 1
+ return f"{prefix}{self._counter[0]}"
+
+ def new_unique_table_name(self, prefix="tmp") -> DbPath:
+ self._counter[0] += 1
+ return self.database.parse_table_name(f"{prefix}{self._counter[0]}_{'%x'%random.randrange(2**32)}")
+
+ def add_table_context(self, *tables: Sequence, **kw):
+ return self.replace(_table_context=self._table_context + list(tables), **kw)
+
+ def quote(self, s: str):
+ return self.dialect.quote(s)
diff --git a/data_diff/sqeleton/queries/extras.py b/data_diff/sqeleton/queries/extras.py
new file mode 100644
index 00000000..1014c372
--- /dev/null
+++ b/data_diff/sqeleton/queries/extras.py
@@ -0,0 +1,62 @@
+"Useful AST classes that don't quite fall within the scope of regular SQL"
+
+from typing import Callable, Sequence
+from runtype import dataclass
+
+from ..abcs.database_types import ColType, Native_UUID
+
+from .compiler import Compiler
+from .ast_classes import Expr, ExprNode, Concat, Code
+
+
+@dataclass
+class NormalizeAsString(ExprNode):
+ expr: ExprNode
+ expr_type: ColType = None
+ type = str
+
+ def compile(self, c: Compiler) -> str:
+ expr = c.compile(self.expr)
+ return c.dialect.normalize_value_by_type(expr, self.expr_type or self.expr.type)
+
+
+@dataclass
+class ApplyFuncAndNormalizeAsString(ExprNode):
+ expr: ExprNode
+ apply_func: Callable = None
+
+ def compile(self, c: Compiler) -> str:
+ expr = self.expr
+ expr_type = expr.type
+
+ if isinstance(expr_type, Native_UUID):
+ # Normalize first, apply template after (for uuids)
+ # Needed because min/max(uuid) fails in postgresql
+ expr = NormalizeAsString(expr, expr_type)
+ if self.apply_func is not None:
+ expr = self.apply_func(expr) # Apply template using Python's string formatting
+
+ else:
+ # Apply template before normalizing (for ints)
+ if self.apply_func is not None:
+ expr = self.apply_func(expr) # Apply template using Python's string formatting
+ expr = NormalizeAsString(expr, expr_type)
+
+ return c.compile(expr)
+
+
+@dataclass
+class Checksum(ExprNode):
+ exprs: Sequence[Expr]
+
+ def compile(self, c: Compiler):
+ if len(self.exprs) > 1:
+ exprs = [Code(f"coalesce({c.compile(expr)}, '')") for expr in self.exprs]
+ # exprs = [c.compile(e) for e in exprs]
+ expr = Concat(exprs, "|")
+ else:
+ # No need to coalesce - safe to assume that key cannot be null
+ (expr,) = self.exprs
+ expr = c.compile(expr)
+ md5 = c.dialect.md5_as_int(expr)
+ return f"sum({md5})"
diff --git a/data_diff/sqeleton/query_utils.py b/data_diff/sqeleton/query_utils.py
new file mode 100644
index 00000000..4b963039
--- /dev/null
+++ b/data_diff/sqeleton/query_utils.py
@@ -0,0 +1,54 @@
+"Module for query utilities that didn't make it into the query-builder (yet)"
+
+from contextlib import suppress
+
+from data_diff.sqeleton.databases import DbPath, QueryError, Oracle
+from data_diff.sqeleton.queries import table, commit, Expr
+
+
+def _drop_table_oracle(name: DbPath):
+ t = table(name)
+ # Experience shows double drop is necessary
+ with suppress(QueryError):
+ yield t.drop()
+ yield t.drop()
+ yield commit
+
+
+def _drop_table(name: DbPath):
+ t = table(name)
+ yield t.drop(if_exists=True)
+ yield commit
+
+
+def drop_table(db, tbl):
+ if isinstance(db, Oracle):
+ db.query(_drop_table_oracle(tbl))
+ else:
+ db.query(_drop_table(tbl))
+
+
+def _append_to_table_oracle(path: DbPath, expr: Expr):
+ """See append_to_table"""
+ assert expr.schema, expr
+ t = table(path, schema=expr.schema)
+ with suppress(QueryError):
+ yield t.create() # uses expr.schema
+ yield commit
+ yield t.insert_expr(expr)
+ yield commit
+
+
+def _append_to_table(path: DbPath, expr: Expr):
+ """Append to table"""
+ assert expr.schema, expr
+ t = table(path, schema=expr.schema)
+ yield t.create(if_not_exists=True) # uses expr.schema
+ yield commit
+ yield t.insert_expr(expr)
+ yield commit
+
+
+def append_to_table(db, path, expr):
+ f = _append_to_table_oracle if isinstance(db, Oracle) else _append_to_table
+ db.query(f(path, expr))
diff --git a/data_diff/sqeleton/repl.py b/data_diff/sqeleton/repl.py
index 9905052f..a3d8b15c 100644
--- a/data_diff/sqeleton/repl.py
+++ b/data_diff/sqeleton/repl.py
@@ -4,7 +4,6 @@
# logging.basicConfig(level=logging.DEBUG)
from . import connect
-from .queries import table
import sys
@@ -27,12 +26,15 @@ def help():
rich.print("Otherwise, runs regular SQL query")
-def main(uri):
+def repl(uri):
db = connect(uri)
db_name = db.name
while True:
- q = input(f"{db_name}> ").strip()
+ try:
+ q = input(f"{db_name}> ").strip()
+ except EOFError:
+ return
if not q:
continue
if q.startswith("*"):
@@ -45,21 +47,28 @@ def main(uri):
help()
continue
try:
- schema = db.query_table_schema((table_name,))
+ path = db.parse_table_name(table_name)
+ print("->", path)
+ schema = db.query_table_schema(path)
except Exception as e:
logging.error(e)
else:
print_table([(k, v[1]) for k, v in schema.items()], ["name", "type"], f"Table '{table_name}'")
else:
+ # Normal SQL query
try:
res = db.query(q)
except Exception as e:
logging.error(e)
else:
if res:
- print_table(res, [str(i) for i in range(len(res[0]))], q)
+ print_table(res.rows, res.columns, None)
-if __name__ == "__main__":
+def main():
uri = sys.argv[1]
+ return repl(uri)
+
+
+if __name__ == "__main__":
main()
diff --git a/data_diff/sqeleton/schema.py b/data_diff/sqeleton/schema.py
new file mode 100644
index 00000000..ddf7e786
--- /dev/null
+++ b/data_diff/sqeleton/schema.py
@@ -0,0 +1,20 @@
+import logging
+
+from .utils import CaseAwareMapping, CaseInsensitiveDict, CaseSensitiveDict
+from .abcs import AbstractDatabase, DbPath
+
+logger = logging.getLogger("schema")
+
+Schema = CaseAwareMapping
+
+
+def create_schema(db: AbstractDatabase, table_path: DbPath, schema: dict, case_sensitive: bool) -> CaseAwareMapping:
+ logger.debug(f"[{db.name}] Schema = {schema}")
+
+ if case_sensitive:
+ return CaseSensitiveDict(schema)
+
+ if len({k.lower() for k in schema}) < len(schema):
+ logger.warning(f'Ambiguous schema for {db}:{".".join(table_path)} | Columns = {", ".join(list(schema))}')
+ logger.warning("We recommend to disable case-insensitivity (set --case-sensitive).")
+ return CaseInsensitiveDict(schema)
diff --git a/data_diff/sqeleton/utils.py b/data_diff/sqeleton/utils.py
new file mode 100644
index 00000000..971ff155
--- /dev/null
+++ b/data_diff/sqeleton/utils.py
@@ -0,0 +1,341 @@
+from typing import (
+ Iterable,
+ Iterator,
+ MutableMapping,
+ Union,
+ Any,
+ Sequence,
+ Dict,
+ Hashable,
+ TypeVar,
+ TYPE_CHECKING,
+ List,
+)
+from abc import abstractmethod
+from weakref import ref
+import math
+import string
+import re
+from uuid import UUID
+from urllib.parse import urlparse
+
+# -- Common --
+
+try:
+ from typing import Self
+except ImportError:
+ Self = Any
+
+
+class WeakCache:
+ def __init__(self):
+ self._cache = {}
+
+ def _hashable_key(self, k: Union[dict, Hashable]) -> Hashable:
+ if isinstance(k, dict):
+ return tuple(k.items())
+ return k
+
+ def add(self, key: Union[dict, Hashable], value: Any):
+ key = self._hashable_key(key)
+ self._cache[key] = ref(value)
+
+ def get(self, key: Union[dict, Hashable]) -> Any:
+ key = self._hashable_key(key)
+
+ value = self._cache[key]()
+ if value is None:
+ del self._cache[key]
+ raise KeyError(f"Key {key} not found, or no longer a valid reference")
+
+ return value
+
+
+def join_iter(joiner: Any, iterable: Iterable) -> Iterable:
+ it = iter(iterable)
+ try:
+ yield next(it)
+ except StopIteration:
+ return
+ for i in it:
+ yield joiner
+ yield i
+
+
+def safezip(*args):
+ "zip but makes sure all sequences are the same length"
+ lens = list(map(len, args))
+ if len(set(lens)) != 1:
+ raise ValueError(f"Mismatching lengths in arguments to safezip: {lens}")
+ return zip(*args)
+
+
+def is_uuid(u):
+ try:
+ UUID(u)
+ except ValueError:
+ return False
+ return True
+
+
+def match_regexps(regexps: Dict[str, Any], s: str) -> Sequence[tuple]:
+ for regexp, v in regexps.items():
+ m = re.match(regexp + "$", s)
+ if m:
+ yield m, v
+
+
+# -- Schema --
+
+V = TypeVar("V")
+
+
+class CaseAwareMapping(MutableMapping[str, V]):
+ @abstractmethod
+ def get_key(self, key: str) -> str:
+ ...
+
+ def new(self, initial=()):
+ return type(self)(initial)
+
+
+class CaseInsensitiveDict(CaseAwareMapping):
+ def __init__(self, initial):
+ self._dict = {k.lower(): (k, v) for k, v in dict(initial).items()}
+
+ def __getitem__(self, key: str) -> V:
+ return self._dict[key.lower()][1]
+
+ def __iter__(self) -> Iterator[V]:
+ return iter(self._dict)
+
+ def __len__(self) -> int:
+ return len(self._dict)
+
+ def __setitem__(self, key: str, value):
+ k = key.lower()
+ if k in self._dict:
+ key = self._dict[k][0]
+ self._dict[k] = key, value
+
+ def __delitem__(self, key: str):
+ del self._dict[key.lower()]
+
+ def get_key(self, key: str) -> str:
+ return self._dict[key.lower()][0]
+
+ def __repr__(self) -> str:
+ return repr(dict(self.items()))
+
+
+class CaseSensitiveDict(dict, CaseAwareMapping):
+ def get_key(self, key):
+ self[key] # Throw KeyError if key doesn't exist
+ return key
+
+ def as_insensitive(self):
+ return CaseInsensitiveDict(self)
+
+
+# -- Alphanumerics --
+
+alphanums = " -" + string.digits + string.ascii_uppercase + "_" + string.ascii_lowercase
+
+
+class ArithString:
+ @classmethod
+ def new(cls, *args, **kw):
+ return cls(*args, **kw)
+
+ def range(self, other: "ArithString", count: int):
+ assert isinstance(other, ArithString)
+ checkpoints = split_space(self.int, other.int, count)
+ return [self.new(int=i) for i in checkpoints]
+
+
+class ArithUUID(UUID, ArithString):
+ "A UUID that supports basic arithmetic (add, sub)"
+
+ def __int__(self):
+ return self.int
+
+ def __add__(self, other: int):
+ if isinstance(other, int):
+ return self.new(int=self.int + other)
+ return NotImplemented
+
+ def __sub__(self, other: Union[UUID, int]):
+ if isinstance(other, int):
+ return self.new(int=self.int - other)
+ elif isinstance(other, UUID):
+ return self.int - other.int
+ return NotImplemented
+
+
+def numberToAlphanum(num: int, base: str = alphanums) -> str:
+ digits = []
+ while num > 0:
+ num, remainder = divmod(num, len(base))
+ digits.append(remainder)
+ return "".join(base[i] for i in digits[::-1])
+
+
+def alphanumToNumber(alphanum: str, base: str = alphanums) -> int:
+ num = 0
+ for c in alphanum:
+ num = num * len(base) + base.index(c)
+ return num
+
+
+def justify_alphanums(s1: str, s2: str):
+ max_len = max(len(s1), len(s2))
+ s1 = s1.ljust(max_len)
+ s2 = s2.ljust(max_len)
+ return s1, s2
+
+
+def alphanums_to_numbers(s1: str, s2: str):
+ s1, s2 = justify_alphanums(s1, s2)
+ n1 = alphanumToNumber(s1)
+ n2 = alphanumToNumber(s2)
+ return n1, n2
+
+
+class ArithAlphanumeric(ArithString):
+ def __init__(self, s: str, max_len=None):
+ if s is None:
+ raise ValueError("Alphanum string cannot be None")
+ if max_len and len(s) > max_len:
+ raise ValueError(f"Length of alphanum value '{str}' is longer than the expected {max_len}")
+
+ for ch in s:
+ if ch not in alphanums:
+ raise ValueError(f"Unexpected character {ch} in alphanum string")
+
+ self._str = s
+ self._max_len = max_len
+
+ # @property
+ # def int(self):
+ # return alphanumToNumber(self._str, alphanums)
+
+ def __str__(self):
+ s = self._str
+ if self._max_len:
+ s = s.rjust(self._max_len, alphanums[0])
+ return s
+
+ def __len__(self):
+ return len(self._str)
+
+ def __repr__(self):
+ return f'alphanum"{self._str}"'
+
+ def __add__(self, other: "Union[ArithAlphanumeric, int]") -> "ArithAlphanumeric":
+ if isinstance(other, int):
+ if other != 1:
+ raise NotImplementedError("not implemented for arbitrary numbers")
+ num = alphanumToNumber(self._str)
+ return self.new(numberToAlphanum(num + 1))
+
+ return NotImplemented
+
+ def range(self, other: "ArithAlphanumeric", count: int):
+ assert isinstance(other, ArithAlphanumeric)
+ n1, n2 = alphanums_to_numbers(self._str, other._str)
+ split = split_space(n1, n2, count)
+ return [self.new(numberToAlphanum(s)) for s in split]
+
+ def __sub__(self, other: "Union[ArithAlphanumeric, int]") -> float:
+ if isinstance(other, ArithAlphanumeric):
+ n1, n2 = alphanums_to_numbers(self._str, other._str)
+ return n1 - n2
+
+ return NotImplemented
+
+ def __ge__(self, other):
+ if not isinstance(other, type(self)):
+ return NotImplemented
+ return self._str >= other._str
+
+ def __lt__(self, other):
+ if not isinstance(other, type(self)):
+ return NotImplemented
+ return self._str < other._str
+
+ def __eq__(self, other):
+ if not isinstance(other, type(self)):
+ return NotImplemented
+ return self._str == other._str
+
+ def new(self, *args, **kw):
+ return type(self)(*args, **kw, max_len=self._max_len)
+
+
+def number_to_human(n):
+ millnames = ["", "k", "m", "b"]
+ n = float(n)
+ millidx = max(
+ 0,
+ min(len(millnames) - 1, int(math.floor(0 if n == 0 else math.log10(abs(n)) / 3))),
+ )
+
+ return "{:.0f}{}".format(n / 10 ** (3 * millidx), millnames[millidx])
+
+
+def split_space(start, end, count) -> List[int]:
+ size = end - start
+ assert count <= size, (count, size)
+ return list(range(start, end, (size + 1) // (count + 1)))[1 : count + 1]
+
+
+def remove_passwords_in_dict(d: dict, replace_with: str = "***"):
+ for k, v in d.items():
+ if k == "password":
+ d[k] = replace_with
+ elif isinstance(v, dict):
+ remove_passwords_in_dict(v, replace_with)
+ elif k.startswith("database"):
+ d[k] = remove_password_from_url(v, replace_with)
+
+
+def _join_if_any(sym, args):
+ args = list(args)
+ if not args:
+ return ""
+ return sym.join(str(a) for a in args if a)
+
+
+def remove_password_from_url(url: str, replace_with: str = "***") -> str:
+ parsed = urlparse(url)
+ account = parsed.username or ""
+ if parsed.password:
+ account += ":" + replace_with
+ host = _join_if_any(":", filter(None, [parsed.hostname, parsed.port]))
+ netloc = _join_if_any("@", filter(None, [account, host]))
+ replaced = parsed._replace(netloc=netloc)
+ return replaced.geturl()
+
+
+def match_like(pattern: str, strs: Sequence[str]) -> Iterable[str]:
+ reo = re.compile(pattern.replace("%", ".*").replace("?", ".") + "$")
+ for s in strs:
+ if reo.match(s):
+ yield s
+
+
+
+class UnknownMeta(type):
+ def __instancecheck__(self, instance):
+ return instance is Unknown
+
+ def __repr__(self):
+ return "Unknown"
+
+
+class Unknown(metaclass=UnknownMeta):
+ def __nonzero__(self):
+ raise TypeError()
+
+ def __new__(class_, *args, **kwargs):
+ raise RuntimeError("Unknown is a singleton")
diff --git a/data_diff/table_segment.py b/data_diff/table_segment.py
index cbeac7bf..f997c8c5 100644
--- a/data_diff/table_segment.py
+++ b/data_diff/table_segment.py
@@ -6,11 +6,11 @@
from runtype import dataclass
from .utils import safezip, Vector
-from sqeleton.utils import ArithString, split_space
-from sqeleton.databases import Database, DbPath, DbKey, DbTime
-from sqeleton.schema import Schema, create_schema
-from sqeleton.queries import Count, Checksum, SKIP, table, this, Expr, min_, max_, Code
-from sqeleton.queries.extras import ApplyFuncAndNormalizeAsString, NormalizeAsString
+from data_diff.sqeleton.utils import ArithString, split_space
+from data_diff.sqeleton.databases import Database, DbPath, DbKey, DbTime
+from data_diff.sqeleton.schema import Schema, create_schema
+from data_diff.sqeleton.queries import Count, Checksum, SKIP, table, this, Expr, min_, max_, Code
+from data_diff.sqeleton.queries.extras import ApplyFuncAndNormalizeAsString, NormalizeAsString
logger = logging.getLogger("table_segment")
diff --git a/data_diff/tracking.py b/data_diff/tracking.py
index 582ebf24..e6ecd0ce 100644
--- a/data_diff/tracking.py
+++ b/data_diff/tracking.py
@@ -57,6 +57,26 @@ def set_entrypoint_name(s):
entrypoint_name = s
+dbt_user_id = None
+dbt_version = None
+dbt_project_id = None
+
+
+def set_dbt_user_id(s):
+ global dbt_user_id
+ dbt_user_id = s
+
+
+def set_dbt_version(s):
+ global dbt_version
+ dbt_version = s
+
+
+def set_dbt_project_id(s):
+ global dbt_project_id
+ dbt_project_id = s
+
+
def get_anonymous_id():
global g_anonymous_id
if g_anonymous_id is None:
@@ -78,6 +98,9 @@ def create_start_event_json(diff_options: Dict[str, Any]):
"diff_options": diff_options,
"data_diff_version:": __version__,
"entrypoint_name": entrypoint_name,
+ "dbt_user_id": dbt_user_id,
+ "dbt_version": dbt_version,
+ "dbt_project_id": dbt_project_id,
},
}
@@ -112,6 +135,9 @@ def create_end_event_json(
"entrypoint_name": entrypoint_name,
"is_cloud": is_cloud,
"diff_id": diff_id,
+ "dbt_user_id": dbt_user_id,
+ "dbt_version": dbt_version,
+ "dbt_project_id": dbt_project_id,
},
}
diff --git a/data_diff/utils.py b/data_diff/utils.py
index 89e506ef..81ca5de5 100644
--- a/data_diff/utils.py
+++ b/data_diff/utils.py
@@ -1,3 +1,4 @@
+import json
import logging
import re
from typing import Dict, Iterable, Sequence
@@ -5,6 +6,7 @@
import operator
import threading
from datetime import datetime
+from tabulate import tabulate
def safezip(*args):
@@ -127,3 +129,43 @@ def __sub__(self, other: "Vector"):
def __repr__(self) -> str:
return "(%s)" % ", ".join(str(k) for k in self)
+
+
+def dbt_diff_string_template(
+ rows_added: str, rows_removed: str, rows_updated: str, rows_unchanged: str, extra_info_dict: Dict, extra_info_str
+) -> str:
+ string_output = f"\n{tabulate([[rows_added, rows_removed]], headers=['Rows Added', 'Rows Removed'])}"
+
+ string_output += f"\n\nUpdated Rows: {rows_updated}\n"
+ string_output += f"Unchanged Rows: {rows_unchanged}\n\n"
+
+ string_output += extra_info_str
+
+ for k, v in extra_info_dict.items():
+ string_output += f"\n{k}: {v}"
+
+ return string_output
+
+
+def _jsons_equiv(a: str, b: str):
+ try:
+ return json.loads(a) == json.loads(b)
+ except (ValueError, TypeError, json.decoder.JSONDecodeError): # not valid jsons
+ return False
+
+
+def diffs_are_equiv_jsons(diff: list, json_cols: dict):
+ overriden_diff_cols = set()
+ if (len(diff) != 2) or ({diff[0][0], diff[1][0]} != {'+', '-'}):
+ return False, overriden_diff_cols
+ match = True
+ for i, (col_a, col_b) in enumerate(safezip(diff[0][1][1:], diff[1][1][1:])): # index 0 is extra_columns first elem
+ # we only attempt to parse columns of JSON type, but we still need to check if non-json columns don't match
+ match = col_a == col_b
+ if not match and (i in json_cols):
+ if _jsons_equiv(col_a, col_b):
+ overriden_diff_cols.add(json_cols[i])
+ match = True
+ if not match:
+ break
+ return match, overriden_diff_cols
diff --git a/data_diff/version.py b/data_diff/version.py
index 3d187266..ab55bb1a 100644
--- a/data_diff/version.py
+++ b/data_diff/version.py
@@ -1 +1 @@
-__version__ = "0.5.0"
+__version__ = "0.7.5"
diff --git a/dev/_bq_import_csv.py b/dev/_bq_import_csv.py
index 5f5d17c1..b9a75087 100644
--- a/dev/_bq_import_csv.py
+++ b/dev/_bq_import_csv.py
@@ -3,12 +3,14 @@
client = bigquery.Client()
table_id = "datafold-dev-2.data_diff.tmp_rating"
-dataset_name = 'data_diff'
+dataset_name = "data_diff"
client.create_dataset(dataset_name, exists_ok=True)
job_config = bigquery.LoadJobConfig(
- source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
+ source_format=bigquery.SourceFormat.CSV,
+ skip_leading_rows=1,
+ autodetect=True,
)
with open("ratings.csv", "rb") as source_file:
@@ -17,12 +19,8 @@
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
-print(
- "Loaded {} rows and {} columns to {}".format(
- table.num_rows, len(table.schema), table_id
- )
-)
+print("Loaded {} rows and {} columns to {}".format(table.num_rows, len(table.schema), table_id))
# run_sql("ALTER TABLE `datafold-dev-2.data_diff.Rating` ADD COLUMN id BYTES")
-# run_sql("UPDATE `datafold-dev-2.data_diff.Rating` SET id = cast(GENERATE_UUID() as bytes) WHERE True")
\ No newline at end of file
+# run_sql("UPDATE `datafold-dev-2.data_diff.Rating` SET id = cast(GENERATE_UUID() as bytes) WHERE True")
diff --git a/dev/presto-conf/standalone/combined.pem b/dev/presto-conf/standalone/combined.pem
new file mode 100644
index 00000000..a2132a13
--- /dev/null
+++ b/dev/presto-conf/standalone/combined.pem
@@ -0,0 +1,52 @@
+-----BEGIN CERTIFICATE-----
+MIIERDCCAyygAwIBAgIUBxO/CflDP+0yZAXt5FKm/WQ4538wDQYJKoZIhvcNAQEL
+BQAwWTELMAkGA1UEBhMCQVUxEzARBgNVBAgMClNvbWUtU3RhdGUxITAfBgNVBAoM
+GEludGVybmV0IFdpZGdpdHMgUHR5IEx0ZDESMBAGA1UEAwwJbG9jYWxob3N0MB4X
+DTIyMDgyNDA4NTI1N1oXDTMyMDgyMTA4NTI1N1owWTELMAkGA1UEBhMCQVUxEzAR
+BgNVBAgMClNvbWUtU3RhdGUxITAfBgNVBAoMGEludGVybmV0IFdpZGdpdHMgUHR5
+IEx0ZDESMBAGA1UEAwwJbG9jYWxob3N0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A
+MIIBCgKCAQEA3lNywkj/eGPGoFA3Lcx++98l17CRy+uzZMtJsr6lYAg1p/n1vPw0
+BQXI5TSBJ6vM/axtwgwrfXQsjQ/GYJKQkb6eEBCc3xb+Rk5HNBiBBZsIjYm0U1zz
+7dKnNwAznjx3j72s2ZQiqkoxcu7Bctw28ynbg0rjNkuUk3QESKuOgaTltpWKZiiu
+XwWasREeH6MH7ROy8db6cz+MwGaig0mUvGPmD97bPRD/X683RyOiXzEaogl/rpGK
+qZ3jRsmS8ZwawzKxx16kqPsX8/01EruGIoubMttr3YoZG044zq7nQqdAAz6wXx6V
+mgzToCHI+/g+8JS/bgqJTyb2Y6aGXExiuQIDAQABo4IBAjCB/zAdBgNVHQ4EFgQU
+5i1F8pTnwjFxw6W/0RjwpaJaK9MwgZYGA1UdIwSBjjCBi4AU5i1F8pTnwjFxw6W/
+0RjwpaJaK9OhXaRbMFkxCzAJBgNVBAYTAkFVMRMwEQYDVQQIDApTb21lLVN0YXRl
+MSEwHwYDVQQKDBhJbnRlcm5ldCBXaWRnaXRzIFB0eSBMdGQxEjAQBgNVBAMMCWxv
+Y2FsaG9zdIIUBxO/CflDP+0yZAXt5FKm/WQ4538wDAYDVR0TBAUwAwEB/zALBgNV
+HQ8EBAMCAvwwFAYDVR0RBA0wC4IJbG9jYWxob3N0MBQGA1UdEgQNMAuCCWxvY2Fs
+aG9zdDANBgkqhkiG9w0BAQsFAAOCAQEAjBQLl/UFSd9TH2VLM1GH8bixtEJ9+rm7
+x+Jw665+XLjW107dJ33qxy9zjd3cZ2fynKg2Tb7+9QAvSlqpt2YMGP9jr4W2w16u
+ngbNB+kfoOotcUChk90aHmdHLZgOOve/ArFIvbr8douLOn0NAJBrj+iX4zC1pgEC
+9hsMUekkAPIcCGc0rEkEc8r8uiUBWNAdEWpBt0X2fE1ownLuB/E/+3HutLTw8Lv0
+b+jNt/vogVixcw/FF4atoO+F7S5FYzAb0U7YXaNISfVPVBsA89oPy7PlxULHDUIF
+Iq+vVqKdj1EXR+Iec0TMiMsa3MnIGkpL7ZuUXaG+xGBaVhGrUp67lQ==
+-----END CERTIFICATE-----
+-----BEGIN RSA PRIVATE KEY-----
+MIIEogIBAAKCAQEA3lNywkj/eGPGoFA3Lcx++98l17CRy+uzZMtJsr6lYAg1p/n1
+vPw0BQXI5TSBJ6vM/axtwgwrfXQsjQ/GYJKQkb6eEBCc3xb+Rk5HNBiBBZsIjYm0
+U1zz7dKnNwAznjx3j72s2ZQiqkoxcu7Bctw28ynbg0rjNkuUk3QESKuOgaTltpWK
+ZiiuXwWasREeH6MH7ROy8db6cz+MwGaig0mUvGPmD97bPRD/X683RyOiXzEaogl/
+rpGKqZ3jRsmS8ZwawzKxx16kqPsX8/01EruGIoubMttr3YoZG044zq7nQqdAAz6w
+Xx6VmgzToCHI+/g+8JS/bgqJTyb2Y6aGXExiuQIDAQABAoIBAD3pKwnjXhDeaA94
+hwUf7zSgfV9E8jTBHCGzYoB+CntljduLBd1stee4Jqt9JYIwm1MA00e4L9wtn8Jg
+ZDO8XLnZRRbgKW8ObhyR684cDMHM3GLdt/OG7P6LLLlqOvWTjQ/gF+Q3FjgplP+W
+cRRVMpAgVdqH3iHehi9RnWfHLlX3WkBC97SumWFzWqBnqUQAC8AvFHiUCqA9qIeA
+8ieOEoE17yv2nkmu+A5OZoCXtVfc2lQ90Fj9QZiv4rIVXBtTRRURJuvi2iX3nOPl
+MsjAUIBK1ndzpJ7wuLICSR1U3/npPC6Va06lTm0H/Q6DEqZjEHbx9TGY3pTgVXuA
++G0C5GkCgYEA/spvoDUMZrH2JE43TT/qMEHPtHW4qT6fTmzu08Gx++8nFmddNgSD
+zrdjxqPMUGV7Q5smwoHaQyqFxHMM2jh8icnV6VoBDrDdZM0eGFAs6HUjKmyaAdQO
+dC4kPiy3LX5pJUnQnmwq1fVsgXWGQF/LhD0L/y6xOiqdhZp/8nv6SFMCgYEA32GR
+gWJQdgWXTXxSEDn0twKPevgAT0s778/7h5osCLG82Q7ab2+Fc1qTleiwiQ2SAuOl
+mWvtz0Eg4dYe/q6jugqkEgAYZvdHGL7CSmC28O98fTLapgKQC5GUUan90sCbRec4
+kjbyx5scICNBYJVchdFg6UUSNz5czORUVgQEF0MCgYB1toUX2Spfj7yOTWyTTgIe
+RWl2kCS+XGYxT3aPcp+OK5E9cofH2xIiQOvh6+8K/beTJm0j0+ZIva6LcjPv5cTz
+y8H+S0zNwrymQ3Wx+eilhOi4QvBsA9KhrmekKfh/FjXxukadyo+HxhlZPjjGKPvX
+nnSacrICk4mvHhAasViSbQKBgD7mZGiAXJO/I0moVhtHlobp66j+qGerkacHc5ZN
+bVTNZ5XfPtbeGj/PI3u01/Dfp1u06m53G7GebznoZzXjyyqZ0HVZHYXw304yeNck
+wJ67cNx4M2VHl3QKfC86pMRxg8d9Qkq5ukdGf/b0tnYR2Mm9mYJV9rkjkFIJgU3v
+N4+tAoGABOlVGuRx2cSQ9QeC0AcqKlxXygdrzyadA7i0KNBZGGyrMSpJDrl2rrRn
+ylzAgGjvfilwQzZuqTm6Vo2yvaX+TTGS44B+DnxCZvuviftea++sNMjuEkBLTCpF
+xk2yOzsOnx652kWO4L+dVrDAxl65f3v0YaKWZI504LFYl18uS/E=
+-----END RSA PRIVATE KEY-----
diff --git a/dev/presto-conf/standalone/password-authenticator.properties b/dev/presto-conf/standalone/password-authenticator.properties
new file mode 100644
index 00000000..68ced471
--- /dev/null
+++ b/dev/presto-conf/standalone/password-authenticator.properties
@@ -0,0 +1,2 @@
+password-authenticator.name=file
+file.password-file=/opt/presto/etc/password.db
diff --git a/dev/presto-conf/standalone/password.db b/dev/presto-conf/standalone/password.db
new file mode 100644
index 00000000..6d499304
--- /dev/null
+++ b/dev/presto-conf/standalone/password.db
@@ -0,0 +1 @@
+test:$2y$10$877iU3J5a26SPDjFSrrz2eFAq2DwMDsBAus92Dj0z5A5qNMNlnpHa
diff --git a/docs/common_use_cases.md b/docs/common_use_cases.md
new file mode 100644
index 00000000..cb084779
--- /dev/null
+++ b/docs/common_use_cases.md
@@ -0,0 +1,14 @@
+# Common Use Cases
+
+## joindiff
+- **Inspect differences between branches**. Make sure your code results in only expected changes.
+- **Validate stability of critical downstream tables**. When refactoring a data pipeline, rest assured that the changes you make to upstream models have not impacted critical downstream models depended on by users and systems.
+- **Conduct better code reviews**. No matter how thoughtfully you review the code, run a diff to ensure that you don't accidentally approve an error.
+
+## hashdiff
+- **Verify data migrations**. Verify that all data was copied when doing a critical data migration. For example, migrating from Heroku PostgreSQL to Amazon RDS.
+- **Verify data pipelines**. Moving data from a relational database to a warehouse/data lake with Fivetran, Airbyte, Debezium, or some other pipeline.
+- **Maintain data integrity SLOs**. You can create and monitor your SLO of e.g. 99.999% data integrity, and alert your team when data is missing.
+- **Debug complex data pipelines**. Data can get lost in pipelines that may span a half-dozen systems. data-diff helps you efficiently track down where a row got lost without needing to individually inspect intermediate datastores.
+- **Detect hard deletes for an `updated_at`-based pipeline**. If you're copying data to your warehouse based on an `updated_at`-style column, data-diff can find any hard-deletes that you may have missed.
+- **Make your replication self-healing**. You can use data-diff to self-heal by using the diff output to write/update rows in the target database.
\ No newline at end of file
diff --git a/docs/conf.py b/docs/conf.py
index dc58fb90..c0e19068 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -90,6 +90,10 @@
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
+autodoc_default_options = {
+ # 'special-members': '__init__',
+ 'exclude-members': 'json,aslist,astuple,replace',
+}
# -- Options for HTML output ----------------------------------------------
@@ -153,7 +157,7 @@
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
- (master_doc, "Datadiff.tex", "Datadiff Documentation", "Erez Shinan", "manual"),
+ (master_doc, "Datadiff.tex", "Datadiff Documentation", author, "manual"),
]
diff --git a/docs/how-to-use.md b/docs/how-to-use.md
deleted file mode 100644
index f65048b8..00000000
--- a/docs/how-to-use.md
+++ /dev/null
@@ -1,159 +0,0 @@
-# How to use
-
-## How to use from the shell (or: command-line)
-
-Run the following command:
-
-```bash
- # Same-DB diff, using outer join
- $ data-diff DB TABLE1 TABLE2 [options]
-
- # Cross-DB diff, using hashes
- $ data-diff DB1 TABLE1 DB2 TABLE2 [options]
-```
-
-Where DB is either a database URL that's compatible with SQLAlchemy, or the name of a database specified in a configuration file.
-
-We recommend using a configuration file, with the ``--conf`` switch, to keep the command simple and manageable.
-
-For a list of example URLs, see [list of supported databases](supported-databases.md).
-
-Note: Because URLs allow many special characters, and may collide with the syntax of your command-line,
-it's recommended to surround them with quotes.
-
-### Options
-
- - `--help` - Show help message and exit.
- - `-k` or `--key-columns` - Name of the primary key column. If none provided, default is 'id'.
- - `-t` or `--update-column` - Name of updated_at/last_updated column
- - `-c` or `--columns` - Names of extra columns to compare. Can be used more than once in the same command.
- Accepts a name or a pattern like in SQL.
- Example: `-c col% -c another_col -c %foorb.r%`
- - `-l` or `--limit` - Maximum number of differences to find (limits maximum bandwidth and runtime)
- - `-s` or `--stats` - Print stats instead of a detailed diff
- - `-d` or `--debug` - Print debug info
- - `-v` or `--verbose` - Print extra info
- - `-i` or `--interactive` - Confirm queries, implies `--debug`
- - `--json` - Print JSONL output for machine readability
- - `--min-age` - Considers only rows older than specified. Useful for specifying replication lag.
- Example: `--min-age=5min` ignores rows from the last 5 minutes.
- Valid units: `d, days, h, hours, min, minutes, mon, months, s, seconds, w, weeks, y, years`
- - `--max-age` - Considers only rows younger than specified. See `--min-age`.
- - `-j` or `--threads` - Number of worker threads to use per database. Default=1.
- - `-w`, `--where` - An additional 'where' expression to restrict the search space.
- - `--conf`, `--run` - Specify the run and configuration from a TOML file. (see below)
- - `--no-tracking` - data-diff sends home anonymous usage data. Use this to disable it.
- - `--bisection-threshold` - Minimal size of segment to be split. Smaller segments will be downloaded and compared locally.
- - `--bisection-factor` - Segments per iteration. When set to 2, it performs binary search.
- - `-m`, `--materialize` - Materialize the diff results into a new table in the database.
- If a table exists by that name, it will be replaced.
- Use `%t` in the name to place a timestamp.
- Example: `-m test_mat_%t`
- - `--assume-unique-key` - Skip validating the uniqueness of the key column during joindiff, which is costly in non-cloud dbs.
- - `--sample-exclusive-rows` - Sample several rows that only appear in one of the tables, but not the other. Use with `-s`.
- - `--materialize-all-rows` - Materialize every row, even if they are the same, instead of just the differing rows.
- - `--table-write-limit` - Maximum number of rows to write when creating materialized or sample tables, per thread. Default=1000.
- - `-a`, `--algorithm` `[auto|joindiff|hashdiff]` - Force algorithm choice
-
-
-
-### How to use with a configuration file
-
-Data-diff lets you load the configuration for a run from a TOML file.
-
-**Reasons to use a configuration file:**
-
-- Convenience: Set-up the parameters for diffs that need to run often
-
-- Easier and more readable: You can define the database connection settings as config values, instead of in a URI.
-
-- Gives you fine-grained control over the settings switches, without requiring any Python code.
-
-Use `--conf` to specify that path to the configuration file. data-diff will load the settings from `run.default`, if it's defined.
-
-Then you can, optionally, use `--run` to choose to load the settings of a specific run, and override the settings `run.default`. (all runs extend `run.default`, like inheritance).
-
-Finally, CLI switches have the final say, and will override the settings defined by the configuration file, and the current run.
-
-Example TOML file:
-
-```toml
-# Specify the connection params to the test database.
-[database.test_postgresql]
-driver = "postgresql"
-user = "postgres"
-password = "Password1"
-
-# Specify the default run params
-[run.default]
-update_column = "timestamp"
-verbose = true
-
-# Specify params for a run 'test_diff'.
-[run.test_diff]
-verbose = false
-# Source 1 ("left")
-1.database = "test_postgresql" # Use options from database.test_postgresql
-1.table = "rating"
-# Source 2 ("right")
-2.database = "postgresql://postgres:Password1@/" # Use URI like in the CLI
-2.table = "rating_del1"
-```
-
-In this example, running `data-diff --conf myconfig.toml --run test_diff` will compare between `rating` and `rating_del1`.
-It will use the `timestamp` column as the update column, as specified in `run.default`. However, it won't be verbose, since that
-flag is overwritten to `false`.
-
-Running it with `data-diff --conf myconfig.toml --run test_diff -v` will set verbose back to `true`.
-
-
-## How to use from Python
-
-Import the `data_diff` module, and use the following functions:
-
-- `connect_to_table()` to connect to a specific table in the database
-
-- `diff_tables()` to diff those tables
-
-
-Example:
-
-```python
-# Optional: Set logging to display the progress of the diff
-import logging
-logging.basicConfig(level=logging.INFO)
-
-from data_diff import connect_to_table, diff_tables
-
-table1 = connect_to_table("postgresql:///", "table_name", "id")
-table2 = connect_to_table("mysql:///", "table_name", "id")
-
-for different_row in diff_tables(table1, table2):
- plus_or_minus, columns = different_row
- print(plus_or_minus, columns)
-```
-
-Run `help(diff_tables)` or [read the docs](https://data-diff.readthedocs.io/en/latest/) to learn about the different options.
-
-## Usage Analytics & Data Privacy
-
-data-diff collects anonymous usage data to help our team improve the tool and to apply development efforts to where our users need them most.
-
-We capture two events: one when the data-diff run starts, and one when it is finished. No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:
-
-- Operating System and Python version
-- Types of databases used (postgresql, mysql, etc.)
-- Sizes of tables diffed, run time, and diff row count (numbers only)
-- Error message, if any, truncated to the first 20 characters.
-- A persistent UUID to identify the session, stored in `~/.datadiff.toml`
-
-If you do not wish to participate, the tracking can be easily disabled with one of the following methods:
-
-* In the CLI, use the `--no-tracking` flag.
-* In the config file, set `no_tracking = true` (for example, under `[run.default]`)
-* If you're using the Python API:
-```python
-import data_diff
-data_diff.disable_tracking() # Call this first, before making any API calls
-# Connect and diff your tables without any tracking
-```
diff --git a/docs/index.rst b/docs/index.rst
index b20d77ed..ffe3bf6b 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -4,22 +4,12 @@
:hidden:
python-api
+ python_examples
data-diff
---------
-**Data-diff** is a command-line tool and Python library to efficiently diff
-rows across two different databases.
-
-⇄ Verifies across many different databases (e.g. *PostgreSQL* -> *Snowflake*) !
-
-🔍 Outputs diff of rows in detail
-
-🚨 Simple CLI/API to create monitoring and alerts
-
-🔥 Verify 25M+ rows in <10s, and 1B+ rows in ~5min.
-
-♾️ Works for tables with 10s of billions of rows
+**Data-diff** is a command-line tool and Python library for comparing tables in and across databases.
For more information, `See our README `_
@@ -29,7 +19,6 @@ Resources
---------
- Source code (git): ``_
-- :doc:`python-api`
- The rest of the `documentation`_
-.. _documentation: https://docs.datafold.com/os_diff/about/
+.. _documentation: https://docs.datafold.com/guides/os_data_diff
diff --git a/docs/python_examples.rst b/docs/python_examples.rst
new file mode 100644
index 00000000..a3a3bf0f
--- /dev/null
+++ b/docs/python_examples.rst
@@ -0,0 +1,44 @@
+Python API Examples
+---------
+
+**Example 1: Diff tables in mysql and postgresql**
+
+.. code-block:: python
+ # Optional: Set logging to display the progress of the diff
+ import logging
+ logging.basicConfig(level=logging.INFO)
+
+ from data_diff import connect_to_table, diff_tables
+
+ table1 = connect_to_table("postgresql:///", "table_name", "id")
+ table2 = connect_to_table("mysql:///", "table_name", "id")
+
+ for different_row in diff_tables(table1, table2):
+ plus_or_minus, columns = different_row
+ print(plus_or_minus, columns)
+
+
+**Example 2: Connect to snowflake using dictionary configuration**
+
+.. code-block:: python
+ SNOWFLAKE_CONN_INFO = {
+ "driver": "snowflake",
+ "user": "erez",
+ "account": "whatever",
+ "database": "TESTS",
+ "warehouse": "COMPUTE_WH",
+ "role": "ACCOUNTADMIN",
+ "schema": "PUBLIC",
+ "key": "snowflake_rsa_key.p8",
+ }
+
+ snowflake_table = connect_to_table(SNOWFLAKE_CONN_INFO, "table_name") # Uses id by default
+
+Run `help(connect_to_table)` and `help(diff_tables)` or read our API reference to learn more about the different options:
+
+- connect_to_table_
+
+- diff_tables_
+
+.. _connect_to_table: https://data-diff.readthedocs.io/en/latest/python-api.html#data_diff.connect_to_table
+.. _diff_tables: https://data-diff.readthedocs.io/en/latest/python-api.html#data_diff.diff_tables
\ No newline at end of file
diff --git a/docs/usage_analytics.md b/docs/usage_analytics.md
new file mode 100644
index 00000000..6262a5e7
--- /dev/null
+++ b/docs/usage_analytics.md
@@ -0,0 +1,22 @@
+# Usage Analytics & Data Privacy
+
+data-diff collects anonymous usage data to help our team improve the tool and to apply development efforts to where our users need them most.
+
+We capture two events: one when the data-diff run starts, and one when it is finished. No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:
+
+- Operating System and Python version
+- Types of databases used (postgresql, mysql, etc.)
+- Sizes of tables diffed, run time, and diff row count (numbers only)
+- Error message, if any, truncated to the first 20 characters.
+- A persistent UUID to indentify the session, stored in `~/.datadiff.toml`
+
+To disable, use one of the following methods:
+
+* **CLI**: use the `--no-tracking` flag.
+* **Config file**: set `no_tracking = true` (for example, under `[run.default]`)
+* **Python API**:
+ ```python
+ import data_diff
+ # Invoke the following before making any API calls
+ data_diff.disable_tracking()
+ ```
\ No newline at end of file
diff --git a/poetry.lock b/poetry.lock
index 5bb97d78..2dc50831 100644
--- a/poetry.lock
+++ b/poetry.lock
@@ -1,4 +1,4 @@
-# This file is automatically @generated by Poetry 1.4.0 and should not be changed by hand.
+# This file is automatically @generated by Poetry 1.4.2 and should not be changed by hand.
[[package]]
name = "agate"
@@ -47,7 +47,7 @@ test = ["coverage", "flake8", "pexpect", "wheel"]
name = "arrow"
version = "1.2.3"
description = "Better dates & times for Python"
-category = "dev"
+category = "main"
optional = false
python-versions = ">=3.6"
files = [
@@ -73,22 +73,25 @@ files = [
[[package]]
name = "attrs"
-version = "22.2.0"
+version = "23.1.0"
description = "Classes Without Boilerplate"
category = "main"
optional = false
-python-versions = ">=3.6"
+python-versions = ">=3.7"
files = [
- {file = "attrs-22.2.0-py3-none-any.whl", hash = "sha256:29e95c7f6778868dbd49170f98f8818f78f3dc5e0e37c0b1f474e3561b240836"},
- {file = "attrs-22.2.0.tar.gz", hash = "sha256:c9227bfc2f01993c03f68db37d1d15c9690188323c067c641f1a35ca58185f99"},
+ {file = "attrs-23.1.0-py3-none-any.whl", hash = "sha256:1f28b4522cdc2fb4256ac1a020c78acf9cba2c6b461ccd2c126f3aa8e8335d04"},
+ {file = "attrs-23.1.0.tar.gz", hash = "sha256:6279836d581513a26f1bf235f9acd333bc9115683f14f7e8fae46c98fc50e015"},
]
+[package.dependencies]
+importlib-metadata = {version = "*", markers = "python_version < \"3.8\""}
+
[package.extras]
-cov = ["attrs[tests]", "coverage-enable-subprocess", "coverage[toml] (>=5.3)"]
-dev = ["attrs[docs,tests]"]
-docs = ["furo", "myst-parser", "sphinx", "sphinx-notfound-page", "sphinxcontrib-towncrier", "towncrier", "zope.interface"]
-tests = ["attrs[tests-no-zope]", "zope.interface"]
-tests-no-zope = ["cloudpickle", "cloudpickle", "hypothesis", "hypothesis", "mypy (>=0.971,<0.990)", "mypy (>=0.971,<0.990)", "pympler", "pympler", "pytest (>=4.3.0)", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-mypy-plugins", "pytest-xdist[psutil]", "pytest-xdist[psutil]"]
+cov = ["attrs[tests]", "coverage[toml] (>=5.3)"]
+dev = ["attrs[docs,tests]", "pre-commit"]
+docs = ["furo", "myst-parser", "sphinx", "sphinx-notfound-page", "sphinxcontrib-towncrier", "towncrier", "zope-interface"]
+tests = ["attrs[tests-no-zope]", "zope-interface"]
+tests-no-zope = ["cloudpickle", "hypothesis", "mypy (>=1.1.1)", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
[[package]]
name = "babel"
@@ -547,6 +550,32 @@ sdist = ["setuptools-rust (>=0.11.4)"]
ssh = ["bcrypt (>=3.1.5)"]
test = ["hypothesis (>=1.11.4,!=3.79.2)", "iso8601", "pretend", "pytest (>=6.2.0)", "pytest-cov", "pytest-subtests", "pytest-xdist", "pytz"]
+[[package]]
+name = "cx-oracle"
+version = "8.3.0"
+description = "Python interface to Oracle"
+category = "main"
+optional = true
+python-versions = "*"
+files = [
+ {file = "cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:b6a23da225f03f50a81980c61dbd6a358c3575f212ca7f4c22bb65a9faf94f7f"},
+ {file = "cx_Oracle-8.3.0-cp310-cp310-win32.whl", hash = "sha256:715a8bbda5982af484ded14d184304cc552c1096c82471dd2948298470e88a04"},
+ {file = "cx_Oracle-8.3.0-cp310-cp310-win_amd64.whl", hash = "sha256:07f01608dfb6603a8f2a868fc7c7bdc951480f187df8dbc50f4d48c884874e6a"},
+ {file = "cx_Oracle-8.3.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:4b3afe7a911cebaceda908228d36839f6441cbd38e5df491ec25960562bb01a0"},
+ {file = "cx_Oracle-8.3.0-cp36-cp36m-win32.whl", hash = "sha256:076ffb71279d6b2dcbf7df028f62a01e18ce5bb73d8b01eab582bf14a62f4a61"},
+ {file = "cx_Oracle-8.3.0-cp36-cp36m-win_amd64.whl", hash = "sha256:b82e4b165ffd807a2bd256259a6b81b0a2452883d39f987509e2292d494ea163"},
+ {file = "cx_Oracle-8.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:b902db61dcdcbbf8dd981f5a46d72fef40c5150c7fc0eb0f0698b462d6eb834e"},
+ {file = "cx_Oracle-8.3.0-cp37-cp37m-win32.whl", hash = "sha256:4c82ca74442c298ceec56d207450c192e06ecf8ad52eb4aaad0812e147ceabf7"},
+ {file = "cx_Oracle-8.3.0-cp37-cp37m-win_amd64.whl", hash = "sha256:54164974d526b76fdefb0b66a42b68e1fca5df78713d0eeb8c1d0047b83f6bcf"},
+ {file = "cx_Oracle-8.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:410747d542e5f94727f5f0e42e9706c772cf9094fb348ce965ab88b3a9e4d2d8"},
+ {file = "cx_Oracle-8.3.0-cp38-cp38-win32.whl", hash = "sha256:3baa878597c5fadb2c72f359f548431c7be001e722ce4a4ebdf3d2293a1bb70b"},
+ {file = "cx_Oracle-8.3.0-cp38-cp38-win_amd64.whl", hash = "sha256:de42bdc882abdc5cea54597da27a05593b44143728e5b629ad5d35decb1a2036"},
+ {file = "cx_Oracle-8.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:df412238a9948340591beee9ec64fa62a2efacc0d91107034a7023e2991fba97"},
+ {file = "cx_Oracle-8.3.0-cp39-cp39-win32.whl", hash = "sha256:70d3cf030aefd71f99b45beba77237b2af448adf5e26be0db3d0d3dee6ea4230"},
+ {file = "cx_Oracle-8.3.0-cp39-cp39-win_amd64.whl", hash = "sha256:bf01ce87edb4ef663b2e5bd604e1e0154d2cc2f12b60301f788b569d9db8a900"},
+ {file = "cx_Oracle-8.3.0.tar.gz", hash = "sha256:3b2d215af4441463c97ea469b9cc307460739f89fdfa8ea222ea3518f1a424d9"},
+]
+
[[package]]
name = "datamodel-code-generator"
version = "0.13.5"
@@ -578,18 +607,18 @@ http = ["httpx"]
[[package]]
name = "datamodel-code-generator"
-version = "0.14.1"
+version = "0.18.0"
description = "Datamodel Code Generator"
category = "main"
optional = false
python-versions = ">=3.7,<4.0"
files = [
- {file = "datamodel_code_generator-0.14.1-py3-none-any.whl", hash = "sha256:c41415bf1ac3b16886eff798adfdd4c1532cc85217b4bc31155b803ebc225370"},
- {file = "datamodel_code_generator-0.14.1.tar.gz", hash = "sha256:3fe9a6545680c8fb02f8f89dc24b364e1c268d0bf0dfc76ca9747c72fcfb60ad"},
+ {file = "datamodel_code_generator-0.18.0-py3-none-any.whl", hash = "sha256:d8a1a2c1fb115e26ac3880d3b373d14c178849f0c842d6b4738a57ee43beeff0"},
+ {file = "datamodel_code_generator-0.18.0.tar.gz", hash = "sha256:5acf69bcf3c440e90a7d0c7b99ca100c8f6ce0348aa4b3da8ee4f9b0242d164c"},
]
[package.dependencies]
-argcomplete = ">=1.10,<3.0"
+argcomplete = ">=1.10,<4.0"
black = ">=19.10b0"
genson = ">=1.2.1,<2.0"
inflect = ">=4.1.0,<6.0"
@@ -599,29 +628,25 @@ openapi-spec-validator = ">=0.2.8,<=0.5.1"
packaging = "*"
prance = ">=0.18.2,<1.0"
pydantic = [
- {version = ">=1.5.1,<2.0", extras = ["email"], markers = "python_version < \"3.10\""},
- {version = ">=1.9.0,<2.0", extras = ["email"], markers = "python_version >= \"3.10\" and python_version < \"3.11\""},
+ {version = ">=1.5.1,<2.0.0", extras = ["email"], markers = "python_version < \"3.10\""},
+ {version = ">=1.9.0,<2.0.0", extras = ["email"], markers = "python_version >= \"3.10\" and python_version < \"3.11\""},
]
PySnooper = ">=0.4.1,<2.0.0"
toml = ">=0.10.0,<1.0.0"
-typed-ast = [
- {version = ">=1.4.2", markers = "python_full_version < \"3.9.8\""},
- {version = ">=1.5.0", markers = "python_full_version >= \"3.9.8\""},
-]
[package.extras]
http = ["httpx"]
[[package]]
name = "dbt-artifacts-parser"
-version = "0.2.5"
+version = "0.3.0"
description = "A dbt artifacts parser in python"
category = "main"
optional = false
python-versions = ">=3.7.0"
files = [
- {file = "dbt-artifacts-parser-0.2.5.tar.gz", hash = "sha256:174c6b5f44e41b1bbd0055d7a1439bc07a2be445aca26948382f3ab427b8228c"},
- {file = "dbt_artifacts_parser-0.2.5-py3-none-any.whl", hash = "sha256:2a8332f7605001cc92f8fbb261cc0bd8551f3e2f624abb2d61ad92b064b41e45"},
+ {file = "dbt-artifacts-parser-0.3.0.tar.gz", hash = "sha256:8da48fd0f294270609f2270744c179de84a214d83cab257cdeb75f63e5f88722"},
+ {file = "dbt_artifacts_parser-0.3.0-py3-none-any.whl", hash = "sha256:da42cda2c2b9cb3ef83e5e3fbeb4ad5ee350038d91dd7376f88d5207b13d2034"},
]
[package.dependencies]
@@ -629,7 +654,7 @@ datamodel-code-generator = ">=0.12.0"
pydantic = ">=1.6"
[package.extras]
-dev = ["build (==0.7.0)", "flit (==3.7.1)", "pdoc3 (>=0.9.2)", "pyyaml (>=5.3)", "yapf (>=0.29.0)"]
+dev = ["build (==0.7.0)", "flit (==3.7.1)", "pdoc3 (>=0.9.2)", "pre-commit (>=2.15.0)", "pyyaml (>=5.3)", "yapf (>=0.29.0)"]
test = ["black (==21.9b0)", "flake8 (>=3.8.3,<4.0.0)", "isort (>=5.0.6,<6.0.0)", "mypy (==0.910)", "pylint (>=2.12.0)", "pytest (>=6.2.4,<7.0.0)", "yapf (>=0.29.0)"]
[[package]]
@@ -781,18 +806,18 @@ files = [
[[package]]
name = "email-validator"
-version = "1.3.0"
+version = "2.0.0.post1"
description = "A robust email address syntax and deliverability validation library."
category = "main"
optional = false
-python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,>=2.7"
+python-versions = ">=3.7"
files = [
- {file = "email_validator-1.3.0-py2.py3-none-any.whl", hash = "sha256:816073f2a7cffef786b29928f58ec16cdac42710a53bb18aa94317e3e145ec5c"},
- {file = "email_validator-1.3.0.tar.gz", hash = "sha256:553a66f8be2ec2dea641ae1d3f29017ab89e9d603d4a25cdaac39eefa283d769"},
+ {file = "email_validator-2.0.0.post1-py3-none-any.whl", hash = "sha256:26efa040ae50e65cc130667080fa0f372f0ac3d852923a76166a54cf6a0ee780"},
+ {file = "email_validator-2.0.0.post1.tar.gz", hash = "sha256:314114acd9421728ae6f74d0c0a5d6ec547d44ef4f20425af4093828af2266f3"},
]
[package.dependencies]
-dnspython = ">=1.15.0"
+dnspython = ">=2.0.0"
idna = ">=2.0.0"
[[package]]
@@ -866,6 +891,25 @@ docs = ["furo", "jaraco.packaging (>=9)", "jaraco.tidelift (>=1.4)", "rst.linker
perf = ["ipython"]
testing = ["flake8 (<5)", "flufl.flake8", "importlib-resources (>=1.3)", "packaging", "pyfakefs", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-flake8", "pytest-mypy (>=0.9.1)", "pytest-perf (>=0.9.2)"]
+[[package]]
+name = "importlib-resources"
+version = "5.12.0"
+description = "Read resources from Python packages"
+category = "main"
+optional = false
+python-versions = ">=3.7"
+files = [
+ {file = "importlib_resources-5.12.0-py3-none-any.whl", hash = "sha256:7b1deeebbf351c7578e09bf2f63fa2ce8b5ffec296e0d349139d43cca061a81a"},
+ {file = "importlib_resources-5.12.0.tar.gz", hash = "sha256:4be82589bf5c1d7999aedf2a45159d10cb3ca4f19b2271f8792bc8e6da7b22f6"},
+]
+
+[package.dependencies]
+zipp = {version = ">=3.1.0", markers = "python_version < \"3.10\""}
+
+[package.extras]
+docs = ["furo", "jaraco.packaging (>=9)", "jaraco.tidelift (>=1.4)", "rst.linker (>=1.9)", "sphinx (>=3.5)", "sphinx-lint"]
+testing = ["flake8 (<5)", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-flake8", "pytest-mypy (>=0.9.1)"]
+
[[package]]
name = "inflect"
version = "5.6.2"
@@ -915,6 +959,41 @@ pipfile-deprecated-finder = ["pipreqs", "requirementslib"]
plugins = ["setuptools"]
requirements-deprecated-finder = ["pip-api", "pipreqs"]
+[[package]]
+name = "jaraco-classes"
+version = "3.2.3"
+description = "Utility functions for Python class constructs"
+category = "main"
+optional = false
+python-versions = ">=3.7"
+files = [
+ {file = "jaraco.classes-3.2.3-py3-none-any.whl", hash = "sha256:2353de3288bc6b82120752201c6b1c1a14b058267fa424ed5ce5984e3b922158"},
+ {file = "jaraco.classes-3.2.3.tar.gz", hash = "sha256:89559fa5c1d3c34eff6f631ad80bb21f378dbcbb35dd161fd2c6b93f5be2f98a"},
+]
+
+[package.dependencies]
+more-itertools = "*"
+
+[package.extras]
+docs = ["jaraco.packaging (>=9)", "jaraco.tidelift (>=1.4)", "rst.linker (>=1.9)", "sphinx (>=3.5)"]
+testing = ["flake8 (<5)", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-flake8", "pytest-mypy (>=0.9.1)"]
+
+[[package]]
+name = "jeepney"
+version = "0.8.0"
+description = "Low-level, pure Python DBus protocol wrapper."
+category = "main"
+optional = false
+python-versions = ">=3.7"
+files = [
+ {file = "jeepney-0.8.0-py3-none-any.whl", hash = "sha256:c0a454ad016ca575060802ee4d590dd912e35c122fa04e70306de3d076cce755"},
+ {file = "jeepney-0.8.0.tar.gz", hash = "sha256:5efe48d255973902f6badc3ce55e2aa6c5c3b3bc642059ef3a91247bcfcc5806"},
+]
+
+[package.extras]
+test = ["async-timeout", "pytest", "pytest-asyncio (>=0.17)", "pytest-trio", "testpath", "trio"]
+trio = ["async_generator", "trio"]
+
[[package]]
name = "jinja2"
version = "2.11.3"
@@ -955,11 +1034,36 @@ six = ">=1.11.0"
[package.extras]
format = ["idna", "jsonpointer (>1.13)", "rfc3987", "strict-rfc3339", "webcolors"]
+[[package]]
+name = "keyring"
+version = "23.13.1"
+description = "Store and access your passwords safely."
+category = "main"
+optional = false
+python-versions = ">=3.7"
+files = [
+ {file = "keyring-23.13.1-py3-none-any.whl", hash = "sha256:771ed2a91909389ed6148631de678f82ddc73737d85a927f382a8a1b157898cd"},
+ {file = "keyring-23.13.1.tar.gz", hash = "sha256:ba2e15a9b35e21908d0aaf4e0a47acc52d6ae33444df0da2b49d41a46ef6d678"},
+]
+
+[package.dependencies]
+importlib-metadata = {version = ">=4.11.4", markers = "python_version < \"3.12\""}
+importlib-resources = {version = "*", markers = "python_version < \"3.9\""}
+"jaraco.classes" = "*"
+jeepney = {version = ">=0.4.2", markers = "sys_platform == \"linux\""}
+pywin32-ctypes = {version = ">=0.2.0", markers = "sys_platform == \"win32\""}
+SecretStorage = {version = ">=3.2", markers = "sys_platform == \"linux\""}
+
+[package.extras]
+completion = ["shtab"]
+docs = ["furo", "jaraco.packaging (>=9)", "jaraco.tidelift (>=1.4)", "rst.linker (>=1.9)", "sphinx (>=3.5)"]
+testing = ["flake8 (<5)", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-flake8", "pytest-mypy (>=0.9.1)"]
+
[[package]]
name = "lark-parser"
version = "0.11.3"
description = "a modern parsing library"
-category = "dev"
+category = "main"
optional = false
python-versions = "*"
files = [
@@ -1127,6 +1231,18 @@ files = [
requests = ">=2.2.1,<3.0"
six = ">=1.9.0,<2.0"
+[[package]]
+name = "more-itertools"
+version = "9.1.0"
+description = "More routines for operating on iterables, beyond itertools"
+category = "main"
+optional = false
+python-versions = ">=3.7"
+files = [
+ {file = "more-itertools-9.1.0.tar.gz", hash = "sha256:cabaa341ad0389ea83c17a94566a53ae4c9d07349861ecb14dc6d0345cf9ac5d"},
+ {file = "more_itertools-9.1.0-py3-none-any.whl", hash = "sha256:d2bc7f02446e86a68911e58ded76d6561eea00cddfb2a91e7019bbb586c799f3"},
+]
+
[[package]]
name = "msgpack"
version = "1.0.4"
@@ -1419,7 +1535,7 @@ ssv = ["swagger-spec-validator (>=2.4,<3.0)"]
name = "preql"
version = "0.2.19"
description = "An interpreted relational query language that compiles to SQL"
-category = "dev"
+category = "main"
optional = false
python-versions = ">=3.6,<4.0"
files = [
@@ -1468,7 +1584,7 @@ tests = ["google-auth", "httpretty", "pytest", "pytest-runner", "requests-kerber
name = "prompt-toolkit"
version = "3.0.36"
description = "Library for building powerful interactive command lines in Python"
-category = "dev"
+category = "main"
optional = false
python-versions = ">=3.6.2"
files = [
@@ -1481,26 +1597,25 @@ wcwidth = "*"
[[package]]
name = "protobuf"
-version = "4.21.12"
+version = "4.22.3"
description = ""
category = "main"
optional = false
python-versions = ">=3.7"
files = [
- {file = "protobuf-4.21.12-cp310-abi3-win32.whl", hash = "sha256:b135410244ebe777db80298297a97fbb4c862c881b4403b71bac9d4107d61fd1"},
- {file = "protobuf-4.21.12-cp310-abi3-win_amd64.whl", hash = "sha256:89f9149e4a0169cddfc44c74f230d7743002e3aa0b9472d8c28f0388102fc4c2"},
- {file = "protobuf-4.21.12-cp37-abi3-macosx_10_9_universal2.whl", hash = "sha256:299ea899484ee6f44604deb71f424234f654606b983cb496ea2a53e3c63ab791"},
- {file = "protobuf-4.21.12-cp37-abi3-manylinux2014_aarch64.whl", hash = "sha256:d1736130bce8cf131ac7957fa26880ca19227d4ad68b4888b3be0dea1f95df97"},
- {file = "protobuf-4.21.12-cp37-abi3-manylinux2014_x86_64.whl", hash = "sha256:78a28c9fa223998472886c77042e9b9afb6fe4242bd2a2a5aced88e3f4422aa7"},
- {file = "protobuf-4.21.12-cp37-cp37m-win32.whl", hash = "sha256:3d164928ff0727d97022957c2b849250ca0e64777ee31efd7d6de2e07c494717"},
- {file = "protobuf-4.21.12-cp37-cp37m-win_amd64.whl", hash = "sha256:f45460f9ee70a0ec1b6694c6e4e348ad2019275680bd68a1d9314b8c7e01e574"},
- {file = "protobuf-4.21.12-cp38-cp38-win32.whl", hash = "sha256:6ab80df09e3208f742c98443b6166bcb70d65f52cfeb67357d52032ea1ae9bec"},
- {file = "protobuf-4.21.12-cp38-cp38-win_amd64.whl", hash = "sha256:1f22ac0ca65bb70a876060d96d914dae09ac98d114294f77584b0d2644fa9c30"},
- {file = "protobuf-4.21.12-cp39-cp39-win32.whl", hash = "sha256:27f4d15021da6d2b706ddc3860fac0a5ddaba34ab679dc182b60a8bb4e1121cc"},
- {file = "protobuf-4.21.12-cp39-cp39-win_amd64.whl", hash = "sha256:237216c3326d46808a9f7c26fd1bd4b20015fb6867dc5d263a493ef9a539293b"},
- {file = "protobuf-4.21.12-py2.py3-none-any.whl", hash = "sha256:a53fd3f03e578553623272dc46ac2f189de23862e68565e83dde203d41b76fc5"},
- {file = "protobuf-4.21.12-py3-none-any.whl", hash = "sha256:b98d0148f84e3a3c569e19f52103ca1feacdac0d2df8d6533cf983d1fda28462"},
- {file = "protobuf-4.21.12.tar.gz", hash = "sha256:7cd532c4566d0e6feafecc1059d04c7915aec8e182d1cf7adee8b24ef1e2e6ab"},
+ {file = "protobuf-4.22.3-cp310-abi3-win32.whl", hash = "sha256:8b54f56d13ae4a3ec140076c9d937221f887c8f64954673d46f63751209e839a"},
+ {file = "protobuf-4.22.3-cp310-abi3-win_amd64.whl", hash = "sha256:7760730063329d42a9d4c4573b804289b738d4931e363ffbe684716b796bde51"},
+ {file = "protobuf-4.22.3-cp37-abi3-macosx_10_9_universal2.whl", hash = "sha256:d14fc1a41d1a1909998e8aff7e80d2a7ae14772c4a70e4bf7db8a36690b54425"},
+ {file = "protobuf-4.22.3-cp37-abi3-manylinux2014_aarch64.whl", hash = "sha256:70659847ee57a5262a65954538088a1d72dfc3e9882695cab9f0c54ffe71663b"},
+ {file = "protobuf-4.22.3-cp37-abi3-manylinux2014_x86_64.whl", hash = "sha256:13233ee2b9d3bd9a5f216c1fa2c321cd564b93d8f2e4f521a85b585447747997"},
+ {file = "protobuf-4.22.3-cp37-cp37m-win32.whl", hash = "sha256:ecae944c6c2ce50dda6bf76ef5496196aeb1b85acb95df5843cd812615ec4b61"},
+ {file = "protobuf-4.22.3-cp37-cp37m-win_amd64.whl", hash = "sha256:d4b66266965598ff4c291416be429cef7989d8fae88b55b62095a2331511b3fa"},
+ {file = "protobuf-4.22.3-cp38-cp38-win32.whl", hash = "sha256:f08aa300b67f1c012100d8eb62d47129e53d1150f4469fd78a29fa3cb68c66f2"},
+ {file = "protobuf-4.22.3-cp38-cp38-win_amd64.whl", hash = "sha256:f2f4710543abec186aee332d6852ef5ae7ce2e9e807a3da570f36de5a732d88e"},
+ {file = "protobuf-4.22.3-cp39-cp39-win32.whl", hash = "sha256:7cf56e31907c532e460bb62010a513408e6cdf5b03fb2611e4b67ed398ad046d"},
+ {file = "protobuf-4.22.3-cp39-cp39-win_amd64.whl", hash = "sha256:e0e630d8e6a79f48c557cd1835865b593d0547dce221c66ed1b827de59c66c97"},
+ {file = "protobuf-4.22.3-py3-none-any.whl", hash = "sha256:52f0a78141078077cfe15fe333ac3e3a077420b9a3f5d1bf9b5fe9d286b4d881"},
+ {file = "protobuf-4.22.3.tar.gz", hash = "sha256:23452f2fdea754a8251d0fc88c0317735ae47217e0d27bf330a30eec2848811a"},
]
[[package]]
@@ -1628,15 +1743,15 @@ dotenv = ["python-dotenv (>=0.10.4)"]
email = ["email-validator (>=1.0.3)"]
[[package]]
-name = "Pygments"
-version = "2.13.0"
+name = "pygments"
+version = "2.15.1"
description = "Pygments is a syntax highlighting package written in Python."
category = "main"
optional = false
-python-versions = ">=3.6"
+python-versions = ">=3.7"
files = [
- {file = "Pygments-2.13.0-py3-none-any.whl", hash = "sha256:f643f331ab57ba3c9d89212ee4a2dabc6e94f117cf4eefde99a0574720d14c42"},
- {file = "Pygments-2.13.0.tar.gz", hash = "sha256:56a8508ae95f98e2b9bdf93a6be5ae3f7d8af858b43e02c5a2ff083726be40c1"},
+ {file = "Pygments-2.15.1-py3-none-any.whl", hash = "sha256:db2db3deb4b4179f399a09054b023b6a586b76499d36965813c71aa8ed7b5fd1"},
+ {file = "Pygments-2.15.1.tar.gz", hash = "sha256:8ace4d3c1dd481894b2005f560ead0f9f19ee64fe983366be1a21e171d12775c"},
]
[package.extras]
@@ -1819,6 +1934,18 @@ files = [
"backports.zoneinfo" = {version = "*", markers = "python_version >= \"3.6\" and python_version < \"3.9\""}
tzdata = {version = "*", markers = "python_version >= \"3.6\""}
+[[package]]
+name = "pywin32-ctypes"
+version = "0.2.0"
+description = ""
+category = "main"
+optional = false
+python-versions = "*"
+files = [
+ {file = "pywin32-ctypes-0.2.0.tar.gz", hash = "sha256:24ffc3b341d457d48e8922352130cf2644024a4ff09762a2261fd34c36ee5942"},
+ {file = "pywin32_ctypes-0.2.0-py2.py3-none-any.whl", hash = "sha256:9dc2d991b3479cc2df15930958b674a48a227d5361d413827a4cfd0b5876fc98"},
+]
+
[[package]]
name = "pyyaml"
version = "6.0"
@@ -1988,6 +2115,22 @@ files = [
{file = "runtype-0.2.7.tar.gz", hash = "sha256:5a9e1212846b3e54d4ba29fd7db602af5544a2a4253d1f8d829087214a8766ad"},
]
+[[package]]
+name = "secretstorage"
+version = "3.3.3"
+description = "Python bindings to FreeDesktop.org Secret Service API"
+category = "main"
+optional = false
+python-versions = ">=3.6"
+files = [
+ {file = "SecretStorage-3.3.3-py3-none-any.whl", hash = "sha256:f356e6628222568e3af06f2eba8df495efa13b3b63081dafd4f7d9a7b7bc9f99"},
+ {file = "SecretStorage-3.3.3.tar.gz", hash = "sha256:2403533ef369eca6d2ba81718576c5e0f564d5cca1b58f73a8b23e7d4eeebd77"},
+]
+
+[package.dependencies]
+cryptography = ">=2.0"
+jeepney = ">=0.6"
+
[[package]]
name = "semver"
version = "2.13.0"
@@ -2076,46 +2219,37 @@ pandas = ["pandas (>=1.0.0,<1.5.0)", "pyarrow (>=6.0.0,<6.1.0)"]
secure-local-storage = ["keyring (!=16.1.0,<24.0.0)"]
[[package]]
-name = "sqeleton"
-version = "0.0.7"
-description = "Python library for querying SQL databases"
+name = "sqlparse"
+version = "0.4.4"
+description = "A non-validating SQL parser."
category = "main"
optional = false
-python-versions = ">=3.7,<4.0"
+python-versions = ">=3.5"
files = [
- {file = "sqeleton-0.0.7-py3-none-any.whl", hash = "sha256:9e16deb240e675af3facdd57de0264546507507728cad1f6fdba3922428891f4"},
- {file = "sqeleton-0.0.7.tar.gz", hash = "sha256:cbfb7f2689ad54b1cb528e67d0954ac1a18ab1f1f757883b62f4e5c9dce0b468"},
+ {file = "sqlparse-0.4.4-py3-none-any.whl", hash = "sha256:5430a4fe2ac7d0f93e66f1efc6e1338a41884b7ddf2a350cedd20ccc4d9d28f3"},
+ {file = "sqlparse-0.4.4.tar.gz", hash = "sha256:d446183e84b8349fa3061f0fe7f06ca94ba65b426946ffebe6e3e8295332420c"},
]
-[package.dependencies]
-click = ">=8.1,<9.0"
-dsnparse = "*"
-rich = "*"
-runtype = ">=0.2.6,<0.3.0"
-toml = ">=0.10.2,<0.11.0"
-
[package.extras]
-clickhouse = ["clickhouse-driver"]
-duckdb = ["duckdb (>=0.7.0,<0.8.0)"]
-mysql = ["mysql-connector-python (==8.0.29)"]
-postgresql = ["psycopg2"]
-presto = ["presto-python-client"]
-snowflake = ["cryptography", "snowflake-connector-python (>=2.7.2,<3.0.0)"]
-trino = ["trino (>=0.314.0,<0.315.0)"]
-tui = ["textual (>=0.9.1,<0.10.0)", "textual-select"]
+dev = ["build", "flake8"]
+doc = ["sphinx"]
+test = ["pytest", "pytest-cov"]
[[package]]
-name = "sqlparse"
-version = "0.4.3"
-description = "A non-validating SQL parser."
+name = "tabulate"
+version = "0.9.0"
+description = "Pretty-print tabular data"
category = "main"
optional = false
-python-versions = ">=3.5"
+python-versions = ">=3.7"
files = [
- {file = "sqlparse-0.4.3-py3-none-any.whl", hash = "sha256:0323c0ec29cd52bceabc1b4d9d579e311f3e4961b98d174201d5622a23b85e34"},
- {file = "sqlparse-0.4.3.tar.gz", hash = "sha256:69ca804846bb114d2ec380e4360a8a340db83f0ccf3afceeb1404df028f57268"},
+ {file = "tabulate-0.9.0-py3-none-any.whl", hash = "sha256:024ca478df22e9340661486f85298cff5f6dcdba14f3813e8830015b9ed1948f"},
+ {file = "tabulate-0.9.0.tar.gz", hash = "sha256:0095b12bf5966de529c0feb1fa08671671b3368eec77d7ef7ab114be2c068b3c"},
]
+[package.extras]
+widechars = ["wcwidth"]
+
[[package]]
name = "text-unidecode"
version = "1.3"
@@ -2289,14 +2423,14 @@ socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"]
[[package]]
name = "vertica-python"
-version = "1.1.1"
+version = "1.3.2"
description = "Official native Python client for the Vertica database."
-category = "dev"
+category = "main"
optional = false
-python-versions = "*"
+python-versions = ">=3.7"
files = [
- {file = "vertica-python-1.1.1.tar.gz", hash = "sha256:dedf56d76b67673b4d57a13f7f96ebdc57b39ea650b93ebf0c05eb6d1d2c0c05"},
- {file = "vertica_python-1.1.1-py2.py3-none-any.whl", hash = "sha256:63d300832d6fe471987880f06a9590eafc46a1f896860881270f6b6645f3bec6"},
+ {file = "vertica-python-1.3.2.tar.gz", hash = "sha256:3664f0610c16cd5d606b8bbe5c0a50359b1e979d9383c07967bbd49d4990a02f"},
+ {file = "vertica_python-1.3.2-py3-none-any.whl", hash = "sha256:e0096e430c0f8249dae0e15c1dd0d28a4e7fcd5216e12643d48eadc8e1cb0540"},
]
[package.dependencies]
@@ -2307,7 +2441,7 @@ six = ">=1.10.0"
name = "wcwidth"
version = "0.2.5"
description = "Measures the displayed width of unicode strings in a terminal"
-category = "dev"
+category = "main"
optional = false
python-versions = "*"
files = [
@@ -2351,15 +2485,15 @@ clickhouse = ["clickhouse-driver"]
dbt = ["dbt-artifacts-parser", "dbt-core"]
duckdb = ["duckdb"]
mysql = ["mysql-connector-python"]
-oracle = []
+oracle = ["cx_Oracle"]
postgresql = ["psycopg2"]
-preql = []
+preql = ["preql"]
presto = ["presto-python-client"]
snowflake = ["cryptography", "snowflake-connector-python"]
trino = ["trino"]
-vertica = []
+vertica = ["vertica-python"]
[metadata]
lock-version = "2.0"
python-versions = "^3.7"
-content-hash = "b3c5efa8d2648d5847390a3e9dfd8e66828f6e352df2258361184c6abe6be08e"
+content-hash = "96bcb369e66de27b5ad8e86337bcf7af471c8b2c84b748c3b9ddd4ee8155c001"
diff --git a/pyproject.toml b/pyproject.toml
index 0fecd2af..1f305ebb 100755
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
[tool.poetry]
name = "data-diff"
-version = "0.5.0"
+version = "0.7.5"
description = "Command-line tool and Python library to efficiently diff rows across two different databases."
authors = ["Datafold "]
license = "MIT"
@@ -25,11 +25,13 @@ packages = [{ include = "data_diff" }]
[tool.poetry.dependencies]
python = "^3.7"
runtype = "^0.2.6"
-dsnparse = "*"
+dsnparse = [
+ { version = "<0.2.0", markers = "python_version < '3.8.0'" },
+ { version = "*", markers = "python_version >= '3.8.0'" }
+]
click = "^8.1"
rich = "*"
toml = "^0.10.2"
-sqeleton = "^0.0.7"
mysql-connector-python = {version="8.0.29", optional=true}
psycopg2 = {version="*", optional=true}
snowflake-connector-python = {version="^2.7.2", optional=true}
@@ -38,8 +40,13 @@ trino = {version="^0.314.0", optional=true}
presto-python-client = {version="*", optional=true}
clickhouse-driver = {version="*", optional=true}
duckdb = {version="^0.7.0", optional=true}
-dbt-artifacts-parser = {version="^0.2.5", optional=true}
+dbt-artifacts-parser = {version="^0.3.0", optional=true}
dbt-core = {version="^1.0.0", optional=true}
+keyring = "*"
+tabulate = "^0.9.0"
+preql = {version="^0.2.19", optional=true}
+cx_Oracle = {version="*", optional=true}
+vertica-python = {version="*", optional=true}
[tool.poetry.dev-dependencies]
parameterized = "*"
@@ -54,7 +61,7 @@ presto-python-client = "*"
clickhouse-driver = "*"
vertica-python = "*"
duckdb = "^0.7.0"
-dbt-artifacts-parser = "^0.2.5"
+dbt-artifacts-parser = "^0.3.0"
dbt-core = "^1.0.0"
# google-cloud-bigquery = "*"
# databricks-sql-connector = "*"
@@ -80,3 +87,7 @@ build-backend = "poetry.core.masonry.api"
[tool.poetry.scripts]
data-diff = 'data_diff.__main__:main'
+sqeleton = 'data_diff.sqeleton.__main__:main'
+
+[tool.black]
+line-length = 120
diff --git a/tests/cloud/files/data_source_list_response.json b/tests/cloud/files/data_source_list_response.json
new file mode 100644
index 00000000..62b50933
--- /dev/null
+++ b/tests/cloud/files/data_source_list_response.json
@@ -0,0 +1,232 @@
+[
+ {
+ "id": 3,
+ "name": "BigQuery",
+ "type": "bigquery",
+ "is_paused": false,
+ "hidden": false,
+ "temp_schema": "marine-potion-312409.ilia_dev",
+ "disable_schema_indexing": false,
+ "disable_profiling": false,
+ "catalog_include_list": "",
+ "catalog_exclude_list": "",
+ "schema_indexing_schedule": null,
+ "schema_max_age_s": null,
+ "profile_schedule": null,
+ "profile_exclude_list": "",
+ "profile_include_list": "",
+ "discourage_manual_profiling": false,
+ "lineage_schedule": null,
+ "float_tolerance": 0.0001,
+ "options": null,
+ "queue_name": null,
+ "scheduled_queue_name": null,
+ "groups": null,
+ "view_only": false,
+ "created_from": null,
+ "source": null,
+ "max_allowed_connections": 25,
+ "last_test": null,
+ "secret_id": null
+ },
+ {
+ "id": 4,
+ "name": "Databricks",
+ "type": "databricks",
+ "is_paused": false,
+ "hidden": false,
+ "temp_schema": "hive_metastore.ilia_dev",
+ "disable_schema_indexing": false,
+ "disable_profiling": false,
+ "catalog_include_list": "hive_metastore.ilia_dev.*",
+ "catalog_exclude_list": "",
+ "schema_indexing_schedule": null,
+ "schema_max_age_s": null,
+ "profile_schedule": null,
+ "profile_exclude_list": "",
+ "profile_include_list": "",
+ "discourage_manual_profiling": false,
+ "lineage_schedule": null,
+ "float_tolerance": 0.0,
+ "options": null,
+ "queue_name": null,
+ "scheduled_queue_name": null,
+ "groups": null,
+ "view_only": false,
+ "created_from": null,
+ "source": null,
+ "max_allowed_connections": 25,
+ "last_test": null,
+ "secret_id": null
+ },
+ {
+ "id": 5,
+ "name": "MySQL",
+ "type": "mysql",
+ "is_paused": false,
+ "hidden": false,
+ "temp_schema": "temporary_test_schema",
+ "disable_schema_indexing": false,
+ "disable_profiling": false,
+ "catalog_include_list": "",
+ "catalog_exclude_list": "",
+ "schema_indexing_schedule": null,
+ "schema_max_age_s": null,
+ "profile_schedule": null,
+ "profile_exclude_list": "",
+ "profile_include_list": "",
+ "discourage_manual_profiling": false,
+ "lineage_schedule": null,
+ "float_tolerance": 1e-7,
+ "options": null,
+ "queue_name": null,
+ "scheduled_queue_name": null,
+ "groups": null,
+ "view_only": false,
+ "created_from": null,
+ "source": null,
+ "max_allowed_connections": 25,
+ "last_test": null,
+ "secret_id": null
+ },
+ {
+ "id": 2,
+ "name": "PostgreSQL",
+ "type": "pg",
+ "is_paused": false,
+ "hidden": false,
+ "temp_schema": "postgres.datafold_tmp",
+ "disable_schema_indexing": false,
+ "disable_profiling": false,
+ "catalog_include_list": "",
+ "catalog_exclude_list": "",
+ "schema_indexing_schedule": "*/30 * * * *",
+ "schema_max_age_s": 780,
+ "profile_schedule": null,
+ "profile_exclude_list": "",
+ "profile_include_list": "",
+ "discourage_manual_profiling": false,
+ "lineage_schedule": null,
+ "float_tolerance": 0.000001,
+ "options": null,
+ "queue_name": null,
+ "scheduled_queue_name": null,
+ "groups": null,
+ "view_only": false,
+ "created_from": null,
+ "source": null,
+ "max_allowed_connections": 10,
+ "last_test": {
+ "tested_at": "2023-03-29T11:02:05.025096+00:00",
+ "results": [
+ {
+ "step": "lineage_download",
+ "status": "done",
+ "result": {
+ "code": "OK",
+ "message": "No lineage downloader for this data source",
+ "outcome": "skipped"
+ }
+ },
+ {
+ "step": "schema_download",
+ "status": "done",
+ "result": {
+ "code": "OK",
+ "message": "Discovered 6 tables",
+ "outcome": "success"
+ }
+ },
+ {
+ "step": "temp_schema",
+ "status": "done",
+ "result": {
+ "code": "ERROR",
+ "message": "Unable to create table \"postgres\".\"datafold_tmp\".\"test_connection\": schema \"datafold_tmp\" does not exist\n",
+ "outcome": "failed"
+ }
+ },
+ {
+ "step": "connection",
+ "status": "done",
+ "result": {
+ "code": "OK",
+ "message": "Connected to the database",
+ "outcome": "success"
+ }
+ }
+ ]
+ },
+ "secret_id": null
+ },
+ {
+ "id": 1,
+ "name": "Snowflake",
+ "type": "snowflake",
+ "is_paused": false,
+ "hidden": false,
+ "temp_schema": "INTEGRATION.ILIA_DEV",
+ "disable_schema_indexing": false,
+ "disable_profiling": false,
+ "catalog_include_list": "*.ILIA_DEV.*\n*.BEERS_ILIA.*\n*.BEERS_ILIA_DEV.*",
+ "catalog_exclude_list": "",
+ "schema_indexing_schedule": null,
+ "schema_max_age_s": null,
+ "profile_schedule": null,
+ "profile_exclude_list": "",
+ "profile_include_list": "",
+ "discourage_manual_profiling": true,
+ "lineage_schedule": null,
+ "float_tolerance": 1e-7,
+ "options": null,
+ "queue_name": null,
+ "scheduled_queue_name": null,
+ "groups": null,
+ "view_only": false,
+ "created_from": null,
+ "source": null,
+ "max_allowed_connections": 25,
+ "last_test": {
+ "tested_at": "2023-03-29T11:11:05.219534+00:00",
+ "results": [
+ {
+ "step": "lineage_download",
+ "status": "done",
+ "result": {
+ "code": "OK",
+ "message": "Lineage fetched",
+ "outcome": "success"
+ }
+ },
+ {
+ "step": "schema_download",
+ "status": "done",
+ "result": {
+ "code": "OK",
+ "message": "Discovered 12493 tables",
+ "outcome": "success"
+ }
+ },
+ {
+ "step": "temp_schema",
+ "status": "done",
+ "result": {
+ "code": "OK",
+ "message": "Schema \"INTEGRATION\".\"ILIA_DEV\" is writeable",
+ "outcome": "success"
+ }
+ },
+ {
+ "step": "connection",
+ "status": "done",
+ "result": {
+ "code": "OK",
+ "message": "Connected to the database",
+ "outcome": "success"
+ }
+ }
+ ]
+ },
+ "secret_id": null
+ }
+]
diff --git a/tests/cloud/files/data_source_schema_config_response.json b/tests/cloud/files/data_source_schema_config_response.json
new file mode 100644
index 00000000..ad7a2510
--- /dev/null
+++ b/tests/cloud/files/data_source_schema_config_response.json
@@ -0,0 +1,548 @@
+[
+ {
+ "name": "BigQuery",
+ "type": "bigquery",
+ "configuration_schema": {
+ "title": "BigQueryConfig",
+ "type": "object",
+ "properties": {
+ "projectId": {
+ "title": "Project ID",
+ "section": "basic",
+ "type": "string"
+ },
+ "jsonKeyFile": {
+ "title": "JSON Key File",
+ "section": "basic",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "useStandardSql": {
+ "title": "Use Standard SQL",
+ "default": true,
+ "section": "config",
+ "type": "boolean"
+ },
+ "location": {
+ "title": "Processing Location",
+ "default": "US",
+ "section": "basic",
+ "examples": [
+ "US"
+ ],
+ "type": "string"
+ },
+ "totalMBytesProcessedLimit": {
+ "title": "Scanned Data Limit (MB)",
+ "section": "config",
+ "type": "integer"
+ },
+ "userDefinedFunctionResourceUri": {
+ "title": "UDF Source URIs",
+ "section": "config",
+ "example": "gs://bucket/date_utils.js",
+ "type": "string"
+ },
+ "extraProjectsToIndex": {
+ "title": "List of extra projects to index (one per line)",
+ "examples": [
+ "project1\nproject2"
+ ],
+ "section": "config",
+ "widget": "multiline",
+ "type": "string"
+ }
+ },
+ "required": [
+ "projectId",
+ "jsonKeyFile"
+ ],
+ "secret": [
+ "jsonKeyFile"
+ ]
+ },
+ "features": [
+ "datadiff",
+ "profiling",
+ "lineage",
+ "timetravel"
+ ]
+ },
+ {
+ "name": "Databricks",
+ "type": "databricks",
+ "configuration_schema": {
+ "title": "DatabricksConfig",
+ "type": "object",
+ "properties": {
+ "host": {
+ "title": "Host",
+ "maxLength": 128,
+ "type": "string"
+ },
+ "http_path": {
+ "title": "HTTP Path",
+ "default": "",
+ "type": "string"
+ },
+ "http_password": {
+ "title": "Access Token",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "database": {
+ "title": "Database",
+ "type": "string"
+ }
+ },
+ "required": [
+ "host",
+ "http_password",
+ "database"
+ ],
+ "secret": [
+ "http_password"
+ ]
+ },
+ "features": [
+ "datadiff",
+ "profiling",
+ "lineage"
+ ]
+ },
+ {
+ "name": "MySQL",
+ "type": "mysql",
+ "configuration_schema": {
+ "title": "MysqlConfig",
+ "type": "object",
+ "properties": {
+ "host": {
+ "title": "Host",
+ "maxLength": 128,
+ "type": "string"
+ },
+ "port": {
+ "title": "Port",
+ "default": 3306,
+ "type": "integer"
+ },
+ "user": {
+ "title": "User",
+ "type": "string"
+ },
+ "passwd": {
+ "title": "Passwd",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "db": {
+ "title": "Database name",
+ "type": "string"
+ }
+ },
+ "required": [
+ "host",
+ "user",
+ "passwd",
+ "db"
+ ],
+ "secret": [
+ "passwd"
+ ]
+ },
+ "features": [
+ "profiling"
+ ]
+ },
+ {
+ "name": "PostgreSQL",
+ "type": "pg",
+ "configuration_schema": {
+ "title": "PostgreSQLConfig",
+ "type": "object",
+ "properties": {
+ "host": {
+ "title": "Host",
+ "maxLength": 128,
+ "type": "string"
+ },
+ "port": {
+ "title": "Port",
+ "default": 5432,
+ "type": "integer"
+ },
+ "user": {
+ "title": "User",
+ "type": "string"
+ },
+ "password": {
+ "title": "Password",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "role": {
+ "title": "Role (case sensitive)",
+ "type": "string"
+ },
+ "dbname": {
+ "title": "Database Name",
+ "type": "string"
+ },
+ "sslmode": {
+ "title": "SSL Mode",
+ "default": "prefer",
+ "section": "config",
+ "type": "string"
+ }
+ },
+ "required": [
+ "host",
+ "user",
+ "dbname"
+ ],
+ "secret": [
+ "password"
+ ]
+ },
+ "features": [
+ "datadiff",
+ "profiling"
+ ]
+ },
+ {
+ "name": "PostgreSQLAurora",
+ "type": "postgres_aurora",
+ "configuration_schema": {
+ "title": "PostgreSQLAuroraConfig",
+ "type": "object",
+ "properties": {
+ "host": {
+ "title": "Host",
+ "maxLength": 128,
+ "type": "string"
+ },
+ "port": {
+ "title": "Port",
+ "default": 5432,
+ "type": "integer"
+ },
+ "user": {
+ "title": "User",
+ "type": "string"
+ },
+ "password": {
+ "title": "Password",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "role": {
+ "title": "Role (case sensitive)",
+ "type": "string"
+ },
+ "dbname": {
+ "title": "Database Name",
+ "type": "string"
+ },
+ "sslmode": {
+ "title": "SSL Mode",
+ "default": "prefer",
+ "section": "config",
+ "type": "string"
+ },
+ "aws_access_key_id": {
+ "title": "AWS Access Key",
+ "type": "string"
+ },
+ "aws_secret_access_key": {
+ "title": "AWS Secret",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "aws_region": {
+ "title": "AWS Region",
+ "type": "string"
+ },
+ "aws_cloudwatch_log_group": {
+ "title": "Cloudwatch Postgres Log Group",
+ "type": "string"
+ },
+ "keep_alive": {
+ "title": "Keep Alive timeout in seconds, leave empty to disable",
+ "type": "integer"
+ }
+ },
+ "required": [
+ "host",
+ "user",
+ "dbname"
+ ],
+ "secret": [
+ "password",
+ "aws_secret_access_key"
+ ]
+ },
+ "features": [
+ "datadiff",
+ "profiling",
+ "lineage"
+ ]
+ },
+ {
+ "name": "PostgreSQLRDS",
+ "type": "postgres_aws_rds",
+ "configuration_schema": {
+ "title": "PostgreSQLAuroraConfig",
+ "type": "object",
+ "properties": {
+ "host": {
+ "title": "Host",
+ "maxLength": 128,
+ "type": "string"
+ },
+ "port": {
+ "title": "Port",
+ "default": 5432,
+ "type": "integer"
+ },
+ "user": {
+ "title": "User",
+ "type": "string"
+ },
+ "password": {
+ "title": "Password",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "role": {
+ "title": "Role (case sensitive)",
+ "type": "string"
+ },
+ "dbname": {
+ "title": "Database Name",
+ "type": "string"
+ },
+ "sslmode": {
+ "title": "SSL Mode",
+ "default": "prefer",
+ "section": "config",
+ "type": "string"
+ },
+ "aws_access_key_id": {
+ "title": "AWS Access Key",
+ "type": "string"
+ },
+ "aws_secret_access_key": {
+ "title": "AWS Secret",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "aws_region": {
+ "title": "AWS Region",
+ "type": "string"
+ },
+ "aws_cloudwatch_log_group": {
+ "title": "Cloudwatch Postgres Log Group",
+ "type": "string"
+ },
+ "keep_alive": {
+ "title": "Keep Alive timeout in seconds, leave empty to disable",
+ "type": "integer"
+ }
+ },
+ "required": [
+ "host",
+ "user",
+ "dbname"
+ ],
+ "secret": [
+ "password",
+ "aws_secret_access_key"
+ ]
+ },
+ "features": [
+ "datadiff",
+ "profiling",
+ "lineage"
+ ]
+ },
+ {
+ "name": "Redshift",
+ "type": "redshift",
+ "configuration_schema": {
+ "title": "RedshiftConfig",
+ "type": "object",
+ "properties": {
+ "host": {
+ "title": "Host",
+ "maxLength": 128,
+ "type": "string"
+ },
+ "port": {
+ "title": "Port",
+ "default": 5432,
+ "type": "integer"
+ },
+ "user": {
+ "title": "User",
+ "type": "string"
+ },
+ "password": {
+ "title": "Password",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "role": {
+ "title": "Role (case sensitive)",
+ "type": "string"
+ },
+ "dbname": {
+ "title": "Database Name",
+ "type": "string"
+ },
+ "sslmode": {
+ "title": "SSL Mode",
+ "default": "prefer",
+ "section": "config",
+ "type": "string"
+ },
+ "adhoc_query_group": {
+ "title": "Query Group for Adhoc Queries",
+ "default": "default",
+ "section": "config",
+ "type": "string"
+ },
+ "scheduled_query_group": {
+ "title": "Query Group for Scheduled Queries",
+ "default": "default",
+ "section": "config",
+ "type": "string"
+ }
+ },
+ "required": [
+ "host",
+ "user",
+ "dbname"
+ ],
+ "secret": [
+ "password"
+ ]
+ },
+ "features": [
+ "datadiff",
+ "profiling",
+ "lineage"
+ ]
+ },
+ {
+ "name": "Snowflake",
+ "type": "snowflake",
+ "configuration_schema": {
+ "title": "SnowflakeConfig",
+ "type": "object",
+ "properties": {
+ "account": {
+ "title": "Account",
+ "maxLength": 128,
+ "type": "string"
+ },
+ "user": {
+ "title": "User",
+ "default": "DATAFOLD",
+ "type": "string"
+ },
+ "password": {
+ "title": "Password",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "keyPairFile": {
+ "title": "Key Pair file (private-key)",
+ "type": "string",
+ "writeOnly": true,
+ "format": "password"
+ },
+ "warehouse": {
+ "title": "Warehouse (case sensitive)",
+ "default": "COMPUTE_WH",
+ "examples": [
+ "COMPUTE_WH"
+ ],
+ "type": "string"
+ },
+ "role": {
+ "title": "Role (case sensitive)",
+ "default": "DATAFOLDROLE",
+ "examples": [
+ "PUBLIC"
+ ],
+ "type": "string"
+ },
+ "default_db": {
+ "title": "Default DB (case sensitive)",
+ "examples": [
+ "MY_DB"
+ ],
+ "type": "string"
+ },
+ "default_schema": {
+ "title": "Default schema (case sensitive)",
+ "default": "PUBLIC",
+ "examples": [
+ "PUBLIC"
+ ],
+ "section": "config",
+ "type": "string"
+ },
+ "region": {
+ "title": "Region",
+ "default": "us-west",
+ "section": "config",
+ "type": "string"
+ },
+ "metadata_database": {
+ "title": "Database containing metadata (usually SNOWFLAKE)",
+ "default": "SNOWFLAKE",
+ "section": "config",
+ "examples": [
+ "SNOWFLAKE"
+ ],
+ "type": "string"
+ },
+ "sql_variables": {
+ "title": "Session variables applied at every connection.",
+ "section": "config",
+ "widget": "multiline",
+ "examples": [
+ "variable_1=10\nvariable_2=test"
+ ],
+ "type": "string"
+ }
+ },
+ "required": [
+ "account",
+ "default_db"
+ ],
+ "secret": [
+ "password",
+ "keyPairFile"
+ ]
+ },
+ "features": [
+ "datadiff",
+ "profiling",
+ "lineage",
+ "timetravel"
+ ]
+ }
+]
diff --git a/tests/cloud/test_data_source.py b/tests/cloud/test_data_source.py
new file mode 100644
index 00000000..8c8cebe4
--- /dev/null
+++ b/tests/cloud/test_data_source.py
@@ -0,0 +1,409 @@
+from io import StringIO
+import json
+from pathlib import Path
+from parameterized import parameterized
+import unittest
+from unittest.mock import Mock, patch
+
+from data_diff.cloud.datafold_api import (
+ TCloudApiDataSourceConfigSchema,
+ TCloudApiDataSourceSchema,
+ TCloudApiDataSource,
+ TCloudApiDataSourceTestResult,
+ TCloudDataSourceTestResult,
+ TDsConfig,
+)
+from data_diff.cloud.data_source import (
+ TDataSourceTestStage,
+ TestDataSourceStatus,
+ create_ds_config,
+ _check_data_source_exists,
+ _get_temp_schema,
+ _test_data_source,
+)
+
+
+DATA_SOURCE_CONFIGS = {
+ "snowflake": TDsConfig(
+ name="ds_name",
+ type="snowflake",
+ options={
+ "account": "account",
+ "user": "user",
+ "password": "password",
+ "warehouse": "warehouse",
+ "role": "role",
+ "default_db": "database",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ ),
+ "pg": TDsConfig(
+ name="ds_name",
+ type="pg",
+ options={
+ "host": "host",
+ "port": 5432,
+ "user": "user",
+ "password": "password",
+ "dbname": "database",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ ),
+ "bigquery": TDsConfig(
+ name="ds_name",
+ type="bigquery",
+ options={
+ "projectId": "project_id",
+ "jsonKeyFile": '{"key1": "value1"}',
+ "location": "US",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ ),
+ "databricks": TDsConfig(
+ name="ds_name",
+ type="databricks",
+ options={
+ "host": "host",
+ "http_path": "some_http_path",
+ "http_password": "password",
+ "database": "database",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ ),
+ "redshift": TDsConfig(
+ name="ds_name",
+ type="redshift",
+ options={
+ "host": "host",
+ "port": 5432,
+ "user": "user",
+ "password": "password",
+ "dbname": "database",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ ),
+ "postgres_aurora": TDsConfig(
+ name="ds_name",
+ type="postgres_aurora",
+ options={
+ "host": "host",
+ "port": 5432,
+ "user": "user",
+ "password": "password",
+ "dbname": "database",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ ),
+ "postgres_aws_rds": TDsConfig(
+ name="ds_name",
+ type="postgres_aws_rds",
+ options={
+ "host": "host",
+ "port": 5432,
+ "user": "user",
+ "password": "password",
+ "dbname": "database",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ ),
+}
+
+
+def format_data_source_config_test(testcase_func, param_num, param):
+ (config,) = param.args
+ return f"{testcase_func.__name__}_{config.type}"
+
+
+class TestDataSource(unittest.TestCase):
+ def setUp(self) -> None:
+ with open(Path(__file__).parent / "files/data_source_schema_config_response.json", "r") as file:
+ self.data_source_schema = [
+ TCloudApiDataSourceConfigSchema(
+ name=item["name"],
+ db_type=item["type"],
+ config_schema=TCloudApiDataSourceSchema.from_orm(item),
+ )
+ for item in json.load(file)
+ ]
+
+ self.db_type_data_source_schemas = {ds_schema.db_type: ds_schema for ds_schema in self.data_source_schema}
+
+ with open(Path(__file__).parent / "files/data_source_list_response.json", "r") as file:
+ self.data_sources = [TCloudApiDataSource(**item) for item in json.load(file)]
+
+ self.api = Mock()
+ self.api.get_data_source_schema_config.return_value = self.data_source_schema
+ self.api.get_data_sources.return_value = self.data_sources
+
+ @parameterized.expand([(c,) for c in DATA_SOURCE_CONFIGS.values()], name_func=format_data_source_config_test)
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
+ def test_get_temp_schema(self, config: TDsConfig, mock_dbt_parser):
+ diff_vars = {
+ "prod_database": "db",
+ "prod_schema": "schema",
+ }
+ mock_dbt_parser.get_datadiff_variables.return_value = diff_vars
+ temp_schema = f'{diff_vars["prod_database"]}.{diff_vars["prod_schema"]}'
+ if config.type == "snowflake":
+ temp_schema = temp_schema.upper()
+ elif config.type in {"pg", "postgres_aurora", "postgres_aws_rds", "redshift"}:
+ temp_schema = temp_schema.lower()
+
+ assert _get_temp_schema(dbt_parser=mock_dbt_parser, db_type=config.type) == temp_schema
+
+ @parameterized.expand([(c,) for c in DATA_SOURCE_CONFIGS.values()], name_func=format_data_source_config_test)
+ def test_create_ds_config(self, config: TDsConfig):
+ inputs = list(config.options.values()) + [config.temp_schema, config.float_tolerance]
+ with patch("rich.prompt.Console.input", side_effect=map(str, inputs)):
+ actual_config = create_ds_config(
+ ds_config=self.db_type_data_source_schemas[config.type],
+ data_source_name=config.name,
+ )
+ self.assertEqual(actual_config, config)
+
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
+ def test_create_snowflake_ds_config_from_dbt_profiles(self, mock_dbt_parser):
+ config = DATA_SOURCE_CONFIGS["snowflake"]
+ mock_dbt_parser.get_connection_creds.return_value = (config.options,)
+ with patch("rich.prompt.Console.input", side_effect=["y", config.temp_schema, str(config.float_tolerance)]):
+ actual_config = create_ds_config(
+ ds_config=self.db_type_data_source_schemas[config.type],
+ data_source_name=config.name,
+ dbt_parser=mock_dbt_parser,
+ )
+ self.assertEqual(actual_config, config)
+
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
+ def test_create_bigquery_ds_config_dbt_oauth(self, mock_dbt_parser):
+ config = DATA_SOURCE_CONFIGS["bigquery"]
+ mock_dbt_parser.get_connection_creds.return_value = (config.options,)
+ with patch("rich.prompt.Console.input", side_effect=["y", config.temp_schema, str(config.float_tolerance)]):
+ actual_config = create_ds_config(
+ ds_config=self.db_type_data_source_schemas[config.type],
+ data_source_name=config.name,
+ dbt_parser=mock_dbt_parser,
+ )
+ self.assertEqual(actual_config, config)
+
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
+ @patch("data_diff.cloud.data_source._get_data_from_bigquery_json")
+ def test_create_bigquery_ds_config_dbt_service_account(self, mock_get_data_from_bigquery_json, mock_dbt_parser):
+ config = DATA_SOURCE_CONFIGS["bigquery"]
+
+ mock_get_data_from_bigquery_json.return_value = json.loads(config.options["jsonKeyFile"])
+ mock_dbt_parser.get_connection_creds.return_value = (
+ {
+ "type": "bigquery",
+ "method": "service-account",
+ "project": config.options["projectId"],
+ "threads": 1,
+ "keyfile": "/some/path",
+ },
+ )
+
+ with patch(
+ "rich.prompt.Console.input",
+ side_effect=["y", config.options["location"], config.temp_schema, str(config.float_tolerance)],
+ ):
+ actual_config = create_ds_config(
+ ds_config=self.db_type_data_source_schemas[config.type],
+ data_source_name=config.name,
+ dbt_parser=mock_dbt_parser,
+ )
+ self.assertEqual(actual_config, config)
+
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
+ def test_create_bigquery_ds_config_dbt_service_account_json(self, mock_dbt_parser):
+ config = DATA_SOURCE_CONFIGS["bigquery"]
+
+ mock_dbt_parser.get_connection_creds.return_value = (
+ {
+ "type": "bigquery",
+ "method": "service-account-json",
+ "project": config.options["projectId"],
+ "threads": 1,
+ "keyfile_json": json.loads(config.options["jsonKeyFile"]),
+ },
+ )
+
+ with patch(
+ "rich.prompt.Console.input",
+ side_effect=["y", config.options["location"], config.temp_schema, str(config.float_tolerance)],
+ ):
+ actual_config = create_ds_config(
+ ds_config=self.db_type_data_source_schemas[config.type],
+ data_source_name=config.name,
+ dbt_parser=mock_dbt_parser,
+ )
+ self.assertEqual(actual_config, config)
+
+ @patch("sys.stdout", new_callable=StringIO)
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
+ def test_create_ds_snowflake_config_from_dbt_profiles_one_param_passed_through_input(
+ self, mock_dbt_parser, mock_stdout
+ ):
+ config = DATA_SOURCE_CONFIGS["snowflake"]
+ options = {**config.options, "type": "snowflake"}
+ options["database"] = options.pop("default_db")
+ account = options.pop("account")
+ mock_dbt_parser.get_connection_creds.return_value = (options,)
+ with patch(
+ "rich.prompt.Console.input", side_effect=["y", account, config.temp_schema, str(config.float_tolerance)]
+ ):
+ actual_config = create_ds_config(
+ ds_config=self.db_type_data_source_schemas[config.type],
+ data_source_name=config.name,
+ dbt_parser=mock_dbt_parser,
+ )
+ self.assertEqual(actual_config, config)
+ self.assertEqual(
+ mock_stdout.getvalue().strip(),
+ 'Cannot extract "account" from dbt profiles.yml. Please, type it manually',
+ )
+
+ @patch("sys.stdout", new_callable=StringIO)
+ def test_create_ds_config_validate_required_parameter(self, mock_stdout):
+ """
+ Here we validate "host" as an example of a required parameter,
+ but it might be any parameter without a default value
+ """
+
+ config = TDsConfig(
+ name="ds_name",
+ type="pg",
+ options={
+ "host": "host",
+ "port": 5432,
+ "user": "user",
+ "password": "password",
+ "dbname": "database",
+ },
+ float_tolerance=0.000001,
+ temp_schema="database.temp_schema",
+ )
+
+ inputs = ["", "host", 5432, "user", "password", "database", config.temp_schema, config.float_tolerance]
+ with patch("rich.prompt.Console.input", side_effect=map(str, inputs)):
+ actual_config = create_ds_config(
+ ds_config=self.db_type_data_source_schemas[config.type],
+ data_source_name=config.name,
+ )
+ self.assertEqual(actual_config, config)
+ self.assertEqual(mock_stdout.getvalue().strip(), "Parameter must not be empty")
+
+ def test_check_data_source_exists(self):
+ self.assertEqual(_check_data_source_exists(self.data_sources, self.data_sources[0].name), self.data_sources[0])
+
+ def test_check_data_source_not_exists(self):
+ self.assertEqual(_check_data_source_exists(self.data_sources, "ds_with_this_name_does_not_exist"), None)
+
+ @patch("data_diff.cloud.data_source.DatafoldAPI")
+ def test_data_source_all_tests_ok(self, mock_api: Mock):
+ mock_api.test_data_source.return_value = 1
+ mock_api.check_data_source_test_results.return_value = [
+ TCloudApiDataSourceTestResult(
+ name="lineage_download",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.SUCCESS,
+ message="No lineage downloader for this data source",
+ outcome="skipped",
+ ),
+ ),
+ TCloudApiDataSourceTestResult(
+ name="schema_download",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.SUCCESS, message="Discovered 6 tables", outcome="success"
+ ),
+ ),
+ TCloudApiDataSourceTestResult(
+ name="temp_schema",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.FAILED, message='Created table "database"."schema"', outcome="failed"
+ ),
+ ),
+ TCloudApiDataSourceTestResult(
+ name="connection",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.SUCCESS, message="Connected to the database", outcome="success"
+ ),
+ ),
+ ]
+
+ expected_results = [
+ TDataSourceTestStage(
+ name="schema_download", status=TestDataSourceStatus.SUCCESS, description="Discovered 6 tables"
+ ),
+ TDataSourceTestStage(
+ name="temp_schema", status=TestDataSourceStatus.FAILED, description='Created table "database"."schema"'
+ ),
+ TDataSourceTestStage(
+ name="connection", status=TestDataSourceStatus.SUCCESS, description="Connected to the database"
+ ),
+ ]
+
+ self.assertEqual(_test_data_source(api=mock_api, data_source_id=1), expected_results)
+
+ @patch("data_diff.cloud.data_source.DatafoldAPI")
+ def test_data_source_one_test_failed(self, mock_api: Mock):
+ mock_api.test_data_source.return_value = 1
+ mock_api.check_data_source_test_results.return_value = [
+ TCloudApiDataSourceTestResult(
+ name="lineage_download",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.SUCCESS,
+ message="No lineage downloader for this data source",
+ outcome="skipped",
+ ),
+ ),
+ TCloudApiDataSourceTestResult(
+ name="schema_download",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.SUCCESS, message="Discovered 6 tables", outcome="success"
+ ),
+ ),
+ TCloudApiDataSourceTestResult(
+ name="temp_schema",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.FAILED,
+ message='Unable to create table "database"."schema"',
+ outcome="failed",
+ ),
+ ),
+ TCloudApiDataSourceTestResult(
+ name="connection",
+ status="done",
+ result=TCloudDataSourceTestResult(
+ status=TestDataSourceStatus.SUCCESS, message="Connected to the database", outcome="success"
+ ),
+ ),
+ ]
+
+ expected_results = [
+ TDataSourceTestStage(
+ name="schema_download", status=TestDataSourceStatus.SUCCESS, description="Discovered 6 tables"
+ ),
+ TDataSourceTestStage(
+ name="temp_schema",
+ status=TestDataSourceStatus.FAILED,
+ description='Unable to create table "database"."schema"',
+ ),
+ TDataSourceTestStage(
+ name="connection", status=TestDataSourceStatus.SUCCESS, description="Connected to the database"
+ ),
+ ]
+
+ self.assertEqual(_test_data_source(api=mock_api, data_source_id=1), expected_results)
diff --git a/tests/common.py b/tests/common.py
index f20bdeb8..7fa68d24 100644
--- a/tests/common.py
+++ b/tests/common.py
@@ -9,8 +9,8 @@
from parameterized import parameterized_class
-from sqeleton.queries import table
-from sqeleton.databases import Database
+from data_diff.sqeleton.queries import table
+from data_diff.sqeleton.databases import Database
from data_diff import databases as db
from data_diff import tracking
diff --git a/tests/dbt_artifacts/dbt_project.yml b/tests/dbt_artifacts/dbt_project.yml
new file mode 100644
index 00000000..cbc0dc97
--- /dev/null
+++ b/tests/dbt_artifacts/dbt_project.yml
@@ -0,0 +1,31 @@
+name: 'jaffle_shop'
+
+config-version: 2
+version: '0.1'
+
+profile: 'jaffle_shop'
+
+model-paths: ["models"]
+seed-paths: ["seeds"]
+test-paths: ["tests"]
+analysis-paths: ["analysis"]
+macro-paths: ["macros"]
+
+target-path: "target"
+clean-targets:
+ - "target"
+ - "dbt_modules"
+ - "logs"
+
+require-dbt-version: [">=1.0.0", "<2.0.0"]
+
+vars:
+ data_diff:
+ prod_database: jaffle_shop
+ prod_schema: prod
+
+models:
+ jaffle_shop:
+ materialized: table
+ staging:
+ materialized: view
diff --git a/tests/dbt_artifacts/jaffle_shop.duckdb b/tests/dbt_artifacts/jaffle_shop.duckdb
new file mode 100644
index 00000000..5c1ad7d9
Binary files /dev/null and b/tests/dbt_artifacts/jaffle_shop.duckdb differ
diff --git a/tests/dbt_artifacts/profiles.yml b/tests/dbt_artifacts/profiles.yml
new file mode 100644
index 00000000..5a4b0831
--- /dev/null
+++ b/tests/dbt_artifacts/profiles.yml
@@ -0,0 +1,7 @@
+jaffle_shop:
+ target: dev
+ outputs:
+ dev:
+ type: duckdb
+ path: "./tests/dbt_artifacts/jaffle_shop.duckdb"
+ schema: dev
diff --git a/tests/dbt_artifacts/target/manifest.json b/tests/dbt_artifacts/target/manifest.json
new file mode 100644
index 00000000..f46aa254
--- /dev/null
+++ b/tests/dbt_artifacts/target/manifest.json
@@ -0,0 +1 @@
+{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/manifest/v8.json", "dbt_version": "1.4.5", "generated_at": "2023-03-28T17:53:00.231489Z", "invocation_id": "289d7789-15b8-44be-a1f6-828f34858212", "env": {}, "project_id": "06e5b98c2db46f8a72cc4f66410e9b3b", "user_id": "1974995a-a39c-4b24-bacf-adfe12e92602", "send_anonymous_usage_stats": true, "adapter_type": "duckdb"}, "nodes": {"model.jaffle_shop.customers": {"database": "jaffle_shop", "schema": "dev", "name": "customers", "resource_type": "model", "package_name": "jaffle_shop", "path": "customers.sql", "original_file_path": "models/customers.sql", "unique_id": "model.jaffle_shop.customers", "fqn": ["jaffle_shop", "customers"], "alias": "customers", "checksum": {"name": "sha256", "checksum": "455b90a31f418ae776213ad9932c7cb72d19a5269a8c722bd9f4e44957313ce8"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "table", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "post-hook": [], "pre-hook": []}, "tags": [], "description": "This table has basic information about a customer, as well as some derived facts based on a customer's orders", "columns": {"customer_id": {"name": "customer_id", "description": "This is a unique identifier for a customer", "meta": {}, "data_type": null, "quote": null, "tags": ["primary-key"]}, "first_name": {"name": "first_name", "description": "Customer's first name. PII.", "meta": {}, "data_type": null, "quote": null, "tags": []}, "last_name": {"name": "last_name", "description": "Customer's last name. PII.", "meta": {}, "data_type": null, "quote": null, "tags": []}, "first_order": {"name": "first_order", "description": "Date (UTC) of a customer's first order", "meta": {}, "data_type": null, "quote": null, "tags": []}, "most_recent_order": {"name": "most_recent_order", "description": "Date (UTC) of a customer's most recent order", "meta": {}, "data_type": null, "quote": null, "tags": []}, "number_of_orders": {"name": "number_of_orders", "description": "Count of the number of orders a customer has placed", "meta": {}, "data_type": null, "quote": null, "tags": []}, "total_order_amount": {"name": "total_order_amount", "description": "Total value (AUD) of a customer's orders", "meta": {}, "data_type": null, "quote": null, "tags": []}}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": "jaffle_shop://models/schema.yml", "build_path": "target/run/jaffle_shop/models/customers.sql", "deferred": false, "unrendered_config": {"materialized": "table"}, "created_at": 1680025829.8450181, "relation_name": "\"jaffle_shop\".\"dev\".\"customers\"", "raw_code": "with customers as (\n\n select * from {{ ref('stg_customers') }}\n\n),\n\norders as (\n\n select * from {{ ref('stg_orders') }}\n\n),\n\npayments as (\n\n select * from {{ ref('stg_payments') }}\n\n),\n\ncustomer_orders as (\n\n select\n customer_id,\n\n min(order_date) as first_order,\n max(order_date) as most_recent_order,\n count(order_id) as number_of_orders\n from orders\n\n group by customer_id\n\n),\n\ncustomer_payments as (\n\n select\n orders.customer_id,\n sum(amount) as total_amount\n\n from payments\n\n left join orders on\n payments.order_id = orders.order_id\n\n group by orders.customer_id\n\n),\n\nfinal as (\n\n select\n customers.customer_id,\n customers.first_name,\n customers.last_name,\n customer_orders.first_order,\n customer_orders.most_recent_order,\n customer_orders.number_of_orders,\n customer_payments.total_amount as customer_lifetime_value\n\n from customers\n\n left join customer_orders\n on customers.customer_id = customer_orders.customer_id\n\n left join customer_payments\n on customers.customer_id = customer_payments.customer_id\n\n)\n\nselect * from final", "language": "sql", "refs": [["stg_customers"], ["stg_orders"], ["stg_payments"]], "sources": [], "metrics": [], "depends_on": {"macros": [], "nodes": ["model.jaffle_shop.stg_customers", "model.jaffle_shop.stg_orders", "model.jaffle_shop.stg_payments"]}, "compiled_path": "target/compiled/jaffle_shop/models/customers.sql", "compiled": true, "compiled_code": "with customers as (\n\n select * from \"jaffle_shop\".\"dev\".\"stg_customers\"\n\n),\n\norders as (\n\n select * from \"jaffle_shop\".\"dev\".\"stg_orders\"\n\n),\n\npayments as (\n\n select * from \"jaffle_shop\".\"dev\".\"stg_payments\"\n\n),\n\ncustomer_orders as (\n\n select\n customer_id,\n\n min(order_date) as first_order,\n max(order_date) as most_recent_order,\n count(order_id) as number_of_orders\n from orders\n\n group by customer_id\n\n),\n\ncustomer_payments as (\n\n select\n orders.customer_id,\n sum(amount) as total_amount\n\n from payments\n\n left join orders on\n payments.order_id = orders.order_id\n\n group by orders.customer_id\n\n),\n\nfinal as (\n\n select\n customers.customer_id,\n customers.first_name,\n customers.last_name,\n customer_orders.first_order,\n customer_orders.most_recent_order,\n customer_orders.number_of_orders,\n customer_payments.total_amount as customer_lifetime_value\n\n from customers\n\n left join customer_orders\n on customers.customer_id = customer_orders.customer_id\n\n left join customer_payments\n on customers.customer_id = customer_payments.customer_id\n\n)\n\nselect * from final", "extra_ctes_injected": true, "extra_ctes": []}, "model.jaffle_shop.orders": {"database": "jaffle_shop", "schema": "dev", "name": "orders", "resource_type": "model", "package_name": "jaffle_shop", "path": "orders.sql", "original_file_path": "models/orders.sql", "unique_id": "model.jaffle_shop.orders", "fqn": ["jaffle_shop", "orders"], "alias": "orders", "checksum": {"name": "sha256", "checksum": "c99b996bdc622502e5d0062779f76f74331f88824cffbd36d14cc6075c394d9a"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "table", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "post-hook": [], "pre-hook": []}, "tags": [], "description": "This table has basic information about orders, as well as some derived facts based on payments", "columns": {"order_id": {"name": "order_id", "description": "This is a unique identifier for an order", "meta": {}, "data_type": null, "quote": null, "tags": ["primary-key"]}, "customer_id": {"name": "customer_id", "description": "Foreign key to the customers table", "meta": {}, "data_type": null, "quote": null, "tags": []}, "order_date": {"name": "order_date", "description": "Date (UTC) that the order was placed", "meta": {}, "data_type": null, "quote": null, "tags": []}, "status": {"name": "status", "description": "Orders can be one of the following statuses:\n\n| status | description |\n|----------------|------------------------------------------------------------------------------------------------------------------------|\n| placed | The order has been placed but has not yet left the warehouse |\n| shipped | The order has ben shipped to the customer and is currently in transit |\n| completed | The order has been received by the customer |\n| return_pending | The customer has indicated that they would like to return the order, but it has not yet been received at the warehouse |\n| returned | The order has been returned by the customer and received at the warehouse |", "meta": {}, "data_type": null, "quote": null, "tags": []}, "amount": {"name": "amount", "description": "Total amount (AUD) of the order", "meta": {}, "data_type": null, "quote": null, "tags": []}, "credit_card_amount": {"name": "credit_card_amount", "description": "Amount of the order (AUD) paid for by credit card", "meta": {}, "data_type": null, "quote": null, "tags": []}, "coupon_amount": {"name": "coupon_amount", "description": "Amount of the order (AUD) paid for by coupon", "meta": {}, "data_type": null, "quote": null, "tags": []}, "bank_transfer_amount": {"name": "bank_transfer_amount", "description": "Amount of the order (AUD) paid for by bank transfer", "meta": {}, "data_type": null, "quote": null, "tags": []}, "gift_card_amount": {"name": "gift_card_amount", "description": "Amount of the order (AUD) paid for by gift card", "meta": {}, "data_type": null, "quote": null, "tags": []}}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": "jaffle_shop://models/schema.yml", "build_path": "target/run/jaffle_shop/models/orders.sql", "deferred": false, "unrendered_config": {"materialized": "table"}, "created_at": 1680025829.846783, "relation_name": "\"jaffle_shop\".\"dev\".\"orders\"", "raw_code": "{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}\n\nwith orders as (\n\n select * from {{ ref('stg_orders') }}\n\n),\n\npayments as (\n\n select * from {{ ref('stg_payments') }}\n\n),\n\norder_payments as (\n\n select\n order_id,\n\n {% for payment_method in payment_methods -%}\n sum(case when payment_method = '{{ payment_method }}' then amount else 0 end) as {{ payment_method }}_amount,\n {% endfor -%}\n\n sum(amount) as total_amount\n\n from payments\n\n group by order_id\n\n),\n\nfinal as (\n\n select\n orders.order_id,\n orders.customer_id,\n orders.order_date,\n orders.status,\n\n {% for payment_method in payment_methods -%}\n\n order_payments.{{ payment_method }}_amount,\n\n {% endfor -%}\n\n order_payments.total_amount as amount\n\n from orders\n\n\n left join order_payments\n on orders.order_id = order_payments.order_id\n\n)\n\nselect * from final\nLIMIT 90", "language": "sql", "refs": [["stg_orders"], ["stg_payments"]], "sources": [], "metrics": [], "depends_on": {"macros": [], "nodes": ["model.jaffle_shop.stg_orders", "model.jaffle_shop.stg_payments"]}, "compiled_path": "target/compiled/jaffle_shop/models/orders.sql", "compiled": true, "compiled_code": "\n\nwith orders as (\n\n select * from \"jaffle_shop\".\"dev\".\"stg_orders\"\n\n),\n\npayments as (\n\n select * from \"jaffle_shop\".\"dev\".\"stg_payments\"\n\n),\n\norder_payments as (\n\n select\n order_id,\n\n sum(case when payment_method = 'credit_card' then amount else 0 end) as credit_card_amount,\n sum(case when payment_method = 'coupon' then amount else 0 end) as coupon_amount,\n sum(case when payment_method = 'bank_transfer' then amount else 0 end) as bank_transfer_amount,\n sum(case when payment_method = 'gift_card' then amount else 0 end) as gift_card_amount,\n sum(amount) as total_amount\n\n from payments\n\n group by order_id\n\n),\n\nfinal as (\n\n select\n orders.order_id,\n orders.customer_id,\n orders.order_date,\n orders.status,\n\n order_payments.credit_card_amount,\n\n order_payments.coupon_amount,\n\n order_payments.bank_transfer_amount,\n\n order_payments.gift_card_amount,\n\n order_payments.total_amount as amount\n\n from orders\n\n\n left join order_payments\n on orders.order_id = order_payments.order_id\n\n)\n\nselect * from final\nLIMIT 90", "extra_ctes_injected": true, "extra_ctes": []}, "model.jaffle_shop.stg_customers": {"database": "jaffle_shop", "schema": "dev", "name": "stg_customers", "resource_type": "model", "package_name": "jaffle_shop", "path": "staging/stg_customers.sql", "original_file_path": "models/staging/stg_customers.sql", "unique_id": "model.jaffle_shop.stg_customers", "fqn": ["jaffle_shop", "staging", "stg_customers"], "alias": "stg_customers", "checksum": {"name": "sha256", "checksum": "6f18a29204dad1de6dbb0c288144c4990742e0a1e065c3b2a67b5f98334c22ba"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "view", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "post-hook": [], "pre-hook": []}, "tags": [], "description": "", "columns": {"customer_id": {"name": "customer_id", "description": "", "meta": {}, "data_type": null, "quote": null, "tags": ["primary-key"]}}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": "jaffle_shop://models/staging/schema.yml", "build_path": "target/run/jaffle_shop/models/staging/stg_customers.sql", "deferred": false, "unrendered_config": {"materialized": "view"}, "created_at": 1680025829.869041, "relation_name": "\"jaffle_shop\".\"dev\".\"stg_customers\"", "raw_code": "with source as (\n\n {#-\n Normally we would select from the table here, but we are using seeds to load\n our data in this project\n #}\n select * from {{ ref('raw_customers') }}\n\n),\n\nrenamed as (\n\n select\n id as customer_id,\n first_name,\n last_name\n\n from source\n\n)\n\nselect * from renamed", "language": "sql", "refs": [["raw_customers"]], "sources": [], "metrics": [], "depends_on": {"macros": [], "nodes": ["seed.jaffle_shop.raw_customers"]}, "compiled_path": "target/compiled/jaffle_shop/models/staging/stg_customers.sql", "compiled": true, "compiled_code": "with source as (\n select * from \"jaffle_shop\".\"dev\".\"raw_customers\"\n\n),\n\nrenamed as (\n\n select\n id as customer_id,\n first_name,\n last_name\n\n from source\n\n)\n\nselect * from renamed", "extra_ctes_injected": true, "extra_ctes": []}, "model.jaffle_shop.stg_payments": {"database": "jaffle_shop", "schema": "dev", "name": "stg_payments", "resource_type": "model", "package_name": "jaffle_shop", "path": "staging/stg_payments.sql", "original_file_path": "models/staging/stg_payments.sql", "unique_id": "model.jaffle_shop.stg_payments", "fqn": ["jaffle_shop", "staging", "stg_payments"], "alias": "stg_payments", "checksum": {"name": "sha256", "checksum": "ec8712986e99fdea30feaba2144bb9eca2bb6dc862880cbb6e21831857595fe0"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "view", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "post-hook": [], "pre-hook": []}, "tags": [], "description": "", "columns": {"payment_id": {"name": "payment_id", "description": "", "meta": {}, "data_type": null, "quote": null, "tags": ["primary-key"]}, "payment_method": {"name": "payment_method", "description": "", "meta": {}, "data_type": null, "quote": null, "tags": []}}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": "jaffle_shop://models/staging/schema.yml", "build_path": "target/run/jaffle_shop/models/staging/stg_payments.sql", "deferred": false, "unrendered_config": {"materialized": "view"}, "created_at": 1680025829.870419, "relation_name": "\"jaffle_shop\".\"dev\".\"stg_payments\"", "raw_code": "with source as (\n \n {#-\n Normally we would select from the table here, but we are using seeds to load\n our data in this project\n #}\n select * from {{ ref('raw_payments') }}\n\n),\n\nrenamed as (\n\n select\n id as payment_id,\n order_id,\n payment_method,\n\n -- `amount` is currently stored in cents, so we convert it to dollars\n amount / 100 as amount,\n amount as amount_cents\n\n from source\n\n)\n\nselect * from renamed", "language": "sql", "refs": [["raw_payments"]], "sources": [], "metrics": [], "depends_on": {"macros": [], "nodes": ["seed.jaffle_shop.raw_payments"]}, "compiled_path": "target/compiled/jaffle_shop/models/staging/stg_payments.sql", "compiled": true, "compiled_code": "with source as (\n select * from \"jaffle_shop\".\"dev\".\"raw_payments\"\n\n),\n\nrenamed as (\n\n select\n id as payment_id,\n order_id,\n payment_method,\n\n -- `amount` is currently stored in cents, so we convert it to dollars\n amount / 100 as amount,\n amount as amount_cents\n\n from source\n\n)\n\nselect * from renamed", "extra_ctes_injected": true, "extra_ctes": []}, "model.jaffle_shop.stg_orders": {"database": "jaffle_shop", "schema": "dev", "name": "stg_orders", "resource_type": "model", "package_name": "jaffle_shop", "path": "staging/stg_orders.sql", "original_file_path": "models/staging/stg_orders.sql", "unique_id": "model.jaffle_shop.stg_orders", "fqn": ["jaffle_shop", "staging", "stg_orders"], "alias": "stg_orders", "checksum": {"name": "sha256", "checksum": "afffa9cbc57e5fd2cf5898ebf571d444a62c9d6d7929d8133d30567fb9a2ce97"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "view", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "post-hook": [], "pre-hook": []}, "tags": [], "description": "", "columns": {"order_id": {"name": "order_id", "description": "", "meta": {}, "data_type": null, "quote": null, "tags": ["primary-key"]}, "status": {"name": "status", "description": "", "meta": {}, "data_type": null, "quote": null, "tags": []}}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": "jaffle_shop://models/staging/schema.yml", "build_path": "target/run/jaffle_shop/models/staging/stg_orders.sql", "deferred": false, "unrendered_config": {"materialized": "view"}, "created_at": 1680025829.869803, "relation_name": "\"jaffle_shop\".\"dev\".\"stg_orders\"", "raw_code": "with source as (\n\n {#-\n Normally we would select from the table here, but we are using seeds to load\n our data in this project\n #}\n select * from {{ ref('raw_orders') }}\n\n),\n\nrenamed as (\n\n select\n id as order_id,\n user_id as customer_id,\n order_date,\n status\n\n from source\n\n)\n\nselect * from renamed", "language": "sql", "refs": [["raw_orders"]], "sources": [], "metrics": [], "depends_on": {"macros": [], "nodes": ["seed.jaffle_shop.raw_orders"]}, "compiled_path": "target/compiled/jaffle_shop/models/staging/stg_orders.sql", "compiled": true, "compiled_code": "with source as (\n select * from \"jaffle_shop\".\"dev\".\"raw_orders\"\n\n),\n\nrenamed as (\n\n select\n id as order_id,\n user_id as customer_id,\n order_date,\n status\n\n from source\n\n)\n\nselect * from renamed", "extra_ctes_injected": true, "extra_ctes": []}, "seed.jaffle_shop.raw_customers": {"database": "jaffle_shop", "schema": "dev", "name": "raw_customers", "resource_type": "seed", "package_name": "jaffle_shop", "path": "raw_customers.csv", "original_file_path": "seeds/raw_customers.csv", "unique_id": "seed.jaffle_shop.raw_customers", "fqn": ["jaffle_shop", "raw_customers"], "alias": "raw_customers", "checksum": {"name": "sha256", "checksum": "24579b4b26098d43265376f3c50be8b10faf8e8fd95f5508074f10f76a12671d"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "seed", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "quote_columns": null, "post-hook": [], "pre-hook": []}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.8330948, "relation_name": "\"jaffle_shop\".\"dev\".\"raw_customers\"", "raw_code": "", "root_path": "/Users/dan/repos/jaffle_shop", "depends_on": {"macros": []}}, "seed.jaffle_shop.raw_orders": {"database": "jaffle_shop", "schema": "dev", "name": "raw_orders", "resource_type": "seed", "package_name": "jaffle_shop", "path": "raw_orders.csv", "original_file_path": "seeds/raw_orders.csv", "unique_id": "seed.jaffle_shop.raw_orders", "fqn": ["jaffle_shop", "raw_orders"], "alias": "raw_orders", "checksum": {"name": "sha256", "checksum": "ee6c68d1639ec2b23a4495ec12475e09b8ed4b61e23ab0411ea7ec76648356f7"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "seed", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "quote_columns": null, "post-hook": [], "pre-hook": []}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.834146, "relation_name": "\"jaffle_shop\".\"dev\".\"raw_orders\"", "raw_code": "", "root_path": "/Users/dan/repos/jaffle_shop", "depends_on": {"macros": []}}, "seed.jaffle_shop.raw_payments": {"database": "jaffle_shop", "schema": "dev", "name": "raw_payments", "resource_type": "seed", "package_name": "jaffle_shop", "path": "raw_payments.csv", "original_file_path": "seeds/raw_payments.csv", "unique_id": "seed.jaffle_shop.raw_payments", "fqn": ["jaffle_shop", "raw_payments"], "alias": "raw_payments", "checksum": {"name": "sha256", "checksum": "03fd407f3135f84456431a923f22fc185a2154079e210c20b690e3ab11687d11"}, "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "materialized": "seed", "incremental_strategy": null, "persist_docs": {}, "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "quote_columns": null, "post-hook": [], "pre-hook": []}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.8351219, "relation_name": "\"jaffle_shop\".\"dev\".\"raw_payments\"", "raw_code": "", "root_path": "/Users/dan/repos/jaffle_shop", "depends_on": {"macros": []}}, "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1": {"test_metadata": {"name": "unique", "kwargs": {"column_name": "customer_id", "model": "{{ get_where_subquery(ref('customers')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "unique_customers_customer_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "unique_customers_customer_id.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1", "fqn": ["jaffle_shop", "unique_customers_customer_id"], "alias": "unique_customers_customer_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.850326, "relation_name": null, "raw_code": "{{ test_unique(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["customers"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_unique"], "nodes": ["model.jaffle_shop.customers"]}, "compiled_path": null, "column_name": "customer_id", "file_key_name": "models.customers"}, "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "customer_id", "model": "{{ get_where_subquery(ref('customers')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_customers_customer_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_customers_customer_id.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d", "fqn": ["jaffle_shop", "not_null_customers_customer_id"], "alias": "not_null_customers_customer_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.851261, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["customers"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.customers"]}, "compiled_path": null, "column_name": "customer_id", "file_key_name": "models.customers"}, "test.jaffle_shop.unique_orders_order_id.fed79b3a6e": {"test_metadata": {"name": "unique", "kwargs": {"column_name": "order_id", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "unique_orders_order_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "unique_orders_order_id.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.unique_orders_order_id.fed79b3a6e", "fqn": ["jaffle_shop", "unique_orders_order_id"], "alias": "unique_orders_order_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.852126, "relation_name": null, "raw_code": "{{ test_unique(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_unique"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "order_id", "file_key_name": "models.orders"}, "test.jaffle_shop.not_null_orders_order_id.cf6c17daed": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "order_id", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_orders_order_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_orders_order_id.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_orders_order_id.cf6c17daed", "fqn": ["jaffle_shop", "not_null_orders_order_id"], "alias": "not_null_orders_order_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.852982, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "order_id", "file_key_name": "models.orders"}, "test.jaffle_shop.not_null_orders_customer_id.c5f02694af": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "customer_id", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_orders_customer_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_orders_customer_id.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_orders_customer_id.c5f02694af", "fqn": ["jaffle_shop", "not_null_orders_customer_id"], "alias": "not_null_orders_customer_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.8539362, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "customer_id", "file_key_name": "models.orders"}, "test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2": {"test_metadata": {"name": "relationships", "kwargs": {"to": "ref('customers')", "field": "customer_id", "column_name": "customer_id", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "relationships_orders_customer_id__customer_id__ref_customers_", "resource_type": "test", "package_name": "jaffle_shop", "path": "relationships_orders_customer_id__customer_id__ref_customers_.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2", "fqn": ["jaffle_shop", "relationships_orders_customer_id__customer_id__ref_customers_"], "alias": "relationships_orders_customer_id__customer_id__ref_customers_", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.854784, "relation_name": null, "raw_code": "{{ test_relationships(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["customers"], ["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_relationships", "macro.dbt.get_where_subquery"], "nodes": ["model.jaffle_shop.customers", "model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "customer_id", "file_key_name": "models.orders"}, "test.jaffle_shop.accepted_values_orders_status__placed__shipped__completed__return_pending__returned.be6b5b5ec3": {"test_metadata": {"name": "accepted_values", "kwargs": {"values": ["placed", "shipped", "completed", "return_pending", "returned"], "column_name": "status", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "accepted_values_orders_status__placed__shipped__completed__return_pending__returned", "resource_type": "test", "package_name": "jaffle_shop", "path": "accepted_values_orders_1ce6ab157c285f7cd2ac656013faf758.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.accepted_values_orders_status__placed__shipped__completed__return_pending__returned.be6b5b5ec3", "fqn": ["jaffle_shop", "accepted_values_orders_status__placed__shipped__completed__return_pending__returned"], "alias": "accepted_values_orders_1ce6ab157c285f7cd2ac656013faf758", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": "accepted_values_orders_1ce6ab157c285f7cd2ac656013faf758", "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {"alias": "accepted_values_orders_1ce6ab157c285f7cd2ac656013faf758"}, "created_at": 1680025829.85997, "relation_name": null, "raw_code": "{{ test_accepted_values(**_dbt_generic_test_kwargs) }}{{ config(alias=\"accepted_values_orders_1ce6ab157c285f7cd2ac656013faf758\") }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_accepted_values", "macro.dbt.get_where_subquery"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "status", "file_key_name": "models.orders"}, "test.jaffle_shop.not_null_orders_amount.106140f9fd": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "amount", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_orders_amount", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_orders_amount.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_orders_amount.106140f9fd", "fqn": ["jaffle_shop", "not_null_orders_amount"], "alias": "not_null_orders_amount", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.864409, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "amount", "file_key_name": "models.orders"}, "test.jaffle_shop.not_null_orders_credit_card_amount.d3ca593b59": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "credit_card_amount", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_orders_credit_card_amount", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_orders_credit_card_amount.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_orders_credit_card_amount.d3ca593b59", "fqn": ["jaffle_shop", "not_null_orders_credit_card_amount"], "alias": "not_null_orders_credit_card_amount", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.865262, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "credit_card_amount", "file_key_name": "models.orders"}, "test.jaffle_shop.not_null_orders_coupon_amount.ab90c90625": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "coupon_amount", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_orders_coupon_amount", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_orders_coupon_amount.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_orders_coupon_amount.ab90c90625", "fqn": ["jaffle_shop", "not_null_orders_coupon_amount"], "alias": "not_null_orders_coupon_amount", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.866215, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "coupon_amount", "file_key_name": "models.orders"}, "test.jaffle_shop.not_null_orders_bank_transfer_amount.7743500c49": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "bank_transfer_amount", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_orders_bank_transfer_amount", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_orders_bank_transfer_amount.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_orders_bank_transfer_amount.7743500c49", "fqn": ["jaffle_shop", "not_null_orders_bank_transfer_amount"], "alias": "not_null_orders_bank_transfer_amount", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.867062, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "bank_transfer_amount", "file_key_name": "models.orders"}, "test.jaffle_shop.not_null_orders_gift_card_amount.413a0d2d7a": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "gift_card_amount", "model": "{{ get_where_subquery(ref('orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_orders_gift_card_amount", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_orders_gift_card_amount.sql", "original_file_path": "models/schema.yml", "unique_id": "test.jaffle_shop.not_null_orders_gift_card_amount.413a0d2d7a", "fqn": ["jaffle_shop", "not_null_orders_gift_card_amount"], "alias": "not_null_orders_gift_card_amount", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.8679218, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.orders"]}, "compiled_path": null, "column_name": "gift_card_amount", "file_key_name": "models.orders"}, "test.jaffle_shop.unique_stg_customers_customer_id.c7614daada": {"test_metadata": {"name": "unique", "kwargs": {"column_name": "customer_id", "model": "{{ get_where_subquery(ref('stg_customers')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "unique_stg_customers_customer_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "unique_stg_customers_customer_id.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.unique_stg_customers_customer_id.c7614daada", "fqn": ["jaffle_shop", "staging", "unique_stg_customers_customer_id"], "alias": "unique_stg_customers_customer_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.870772, "relation_name": null, "raw_code": "{{ test_unique(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["stg_customers"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_unique"], "nodes": ["model.jaffle_shop.stg_customers"]}, "compiled_path": null, "column_name": "customer_id", "file_key_name": "models.stg_customers"}, "test.jaffle_shop.not_null_stg_customers_customer_id.e2cfb1f9aa": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "customer_id", "model": "{{ get_where_subquery(ref('stg_customers')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_stg_customers_customer_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_stg_customers_customer_id.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.not_null_stg_customers_customer_id.e2cfb1f9aa", "fqn": ["jaffle_shop", "staging", "not_null_stg_customers_customer_id"], "alias": "not_null_stg_customers_customer_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.871656, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["stg_customers"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.stg_customers"]}, "compiled_path": null, "column_name": "customer_id", "file_key_name": "models.stg_customers"}, "test.jaffle_shop.unique_stg_orders_order_id.e3b841c71a": {"test_metadata": {"name": "unique", "kwargs": {"column_name": "order_id", "model": "{{ get_where_subquery(ref('stg_orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "unique_stg_orders_order_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "unique_stg_orders_order_id.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.unique_stg_orders_order_id.e3b841c71a", "fqn": ["jaffle_shop", "staging", "unique_stg_orders_order_id"], "alias": "unique_stg_orders_order_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.87264, "relation_name": null, "raw_code": "{{ test_unique(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["stg_orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_unique"], "nodes": ["model.jaffle_shop.stg_orders"]}, "compiled_path": null, "column_name": "order_id", "file_key_name": "models.stg_orders"}, "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "order_id", "model": "{{ get_where_subquery(ref('stg_orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_stg_orders_order_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_stg_orders_order_id.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64", "fqn": ["jaffle_shop", "staging", "not_null_stg_orders_order_id"], "alias": "not_null_stg_orders_order_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.873489, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["stg_orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.stg_orders"]}, "compiled_path": null, "column_name": "order_id", "file_key_name": "models.stg_orders"}, "test.jaffle_shop.accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned.080fb20aad": {"test_metadata": {"name": "accepted_values", "kwargs": {"values": ["placed", "shipped", "completed", "return_pending", "returned"], "column_name": "status", "model": "{{ get_where_subquery(ref('stg_orders')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned", "resource_type": "test", "package_name": "jaffle_shop", "path": "accepted_values_stg_orders_4f514bf94b77b7ea437830eec4421c58.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned.080fb20aad", "fqn": ["jaffle_shop", "staging", "accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned"], "alias": "accepted_values_stg_orders_4f514bf94b77b7ea437830eec4421c58", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": "accepted_values_stg_orders_4f514bf94b77b7ea437830eec4421c58", "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {"alias": "accepted_values_stg_orders_4f514bf94b77b7ea437830eec4421c58"}, "created_at": 1680025829.874337, "relation_name": null, "raw_code": "{{ test_accepted_values(**_dbt_generic_test_kwargs) }}{{ config(alias=\"accepted_values_stg_orders_4f514bf94b77b7ea437830eec4421c58\") }}", "language": "sql", "refs": [["stg_orders"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_accepted_values", "macro.dbt.get_where_subquery"], "nodes": ["model.jaffle_shop.stg_orders"]}, "compiled_path": null, "column_name": "status", "file_key_name": "models.stg_orders"}, "test.jaffle_shop.unique_stg_payments_payment_id.3744510712": {"test_metadata": {"name": "unique", "kwargs": {"column_name": "payment_id", "model": "{{ get_where_subquery(ref('stg_payments')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "unique_stg_payments_payment_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "unique_stg_payments_payment_id.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.unique_stg_payments_payment_id.3744510712", "fqn": ["jaffle_shop", "staging", "unique_stg_payments_payment_id"], "alias": "unique_stg_payments_payment_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.877381, "relation_name": null, "raw_code": "{{ test_unique(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["stg_payments"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_unique"], "nodes": ["model.jaffle_shop.stg_payments"]}, "compiled_path": null, "column_name": "payment_id", "file_key_name": "models.stg_payments"}, "test.jaffle_shop.not_null_stg_payments_payment_id.c19cc50075": {"test_metadata": {"name": "not_null", "kwargs": {"column_name": "payment_id", "model": "{{ get_where_subquery(ref('stg_payments')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "not_null_stg_payments_payment_id", "resource_type": "test", "package_name": "jaffle_shop", "path": "not_null_stg_payments_payment_id.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.not_null_stg_payments_payment_id.c19cc50075", "fqn": ["jaffle_shop", "staging", "not_null_stg_payments_payment_id"], "alias": "not_null_stg_payments_payment_id", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": null, "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": ["primary-key"], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {}, "created_at": 1680025829.8782341, "relation_name": null, "raw_code": "{{ test_not_null(**_dbt_generic_test_kwargs) }}", "language": "sql", "refs": [["stg_payments"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_not_null"], "nodes": ["model.jaffle_shop.stg_payments"]}, "compiled_path": null, "column_name": "payment_id", "file_key_name": "models.stg_payments"}, "test.jaffle_shop.accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card.3c3820f278": {"test_metadata": {"name": "accepted_values", "kwargs": {"values": ["credit_card", "coupon", "bank_transfer", "gift_card"], "column_name": "payment_method", "model": "{{ get_where_subquery(ref('stg_payments')) }}"}, "namespace": null}, "database": "jaffle_shop", "schema": "dev_dbt_test__audit", "name": "accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card", "resource_type": "test", "package_name": "jaffle_shop", "path": "accepted_values_stg_payments_c7909fb19b1f0177c2bf99c7912f06ef.sql", "original_file_path": "models/staging/schema.yml", "unique_id": "test.jaffle_shop.accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card.3c3820f278", "fqn": ["jaffle_shop", "staging", "accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card"], "alias": "accepted_values_stg_payments_c7909fb19b1f0177c2bf99c7912f06ef", "checksum": {"name": "none", "checksum": ""}, "config": {"enabled": true, "alias": "accepted_values_stg_payments_c7909fb19b1f0177c2bf99c7912f06ef", "schema": "dbt_test__audit", "database": null, "tags": [], "meta": {}, "materialized": "test", "severity": "ERROR", "store_failures": null, "where": null, "limit": null, "fail_calc": "count(*)", "warn_if": "!= 0", "error_if": "!= 0"}, "tags": [], "description": "", "columns": {}, "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "build_path": null, "deferred": false, "unrendered_config": {"alias": "accepted_values_stg_payments_c7909fb19b1f0177c2bf99c7912f06ef"}, "created_at": 1680025829.87921, "relation_name": null, "raw_code": "{{ test_accepted_values(**_dbt_generic_test_kwargs) }}{{ config(alias=\"accepted_values_stg_payments_c7909fb19b1f0177c2bf99c7912f06ef\") }}", "language": "sql", "refs": [["stg_payments"]], "sources": [], "metrics": [], "depends_on": {"macros": ["macro.dbt.test_accepted_values", "macro.dbt.get_where_subquery"], "nodes": ["model.jaffle_shop.stg_payments"]}, "compiled_path": null, "column_name": "payment_method", "file_key_name": "models.stg_payments"}}, "sources": {}, "macros": {"macro.dbt_duckdb.duckdb__get_binding_char": {"name": "duckdb__get_binding_char", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/seed.sql", "original_file_path": "macros/seed.sql", "unique_id": "macro.dbt_duckdb.duckdb__get_binding_char", "macro_sql": "{% macro duckdb__get_binding_char() %}\n {{ return('?') }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.528875, "supported_languages": null}, "macro.dbt_duckdb.duckdb__get_batch_size": {"name": "duckdb__get_batch_size", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/seed.sql", "original_file_path": "macros/seed.sql", "unique_id": "macro.dbt_duckdb.duckdb__get_batch_size", "macro_sql": "{% macro duckdb__get_batch_size() %}\n {{ return(10000) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.529001, "supported_languages": null}, "macro.dbt_duckdb.duckdb__load_csv_rows": {"name": "duckdb__load_csv_rows", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/seed.sql", "original_file_path": "macros/seed.sql", "unique_id": "macro.dbt_duckdb.duckdb__load_csv_rows", "macro_sql": "{% macro duckdb__load_csv_rows(model, agate_table) %}\n {% set batch_size = get_batch_size() %}\n {% set agate_table = adapter.convert_datetimes_to_strs(agate_table) %}\n {% set cols_sql = get_seed_column_quoted_csv(model, agate_table.column_names) %}\n {% set bindings = [] %}\n\n {% set statements = [] %}\n\n {% for chunk in agate_table.rows | batch(batch_size) %}\n {% set bindings = [] %}\n\n {% for row in chunk %}\n {% do bindings.extend(row) %}\n {% endfor %}\n\n {% set sql %}\n insert into {{ this.render() }} ({{ cols_sql }}) values\n {% for row in chunk -%}\n ({%- for column in agate_table.column_names -%}\n {{ get_binding_char() }}\n {%- if not loop.last%},{%- endif %}\n {%- endfor -%})\n {%- if not loop.last%},{%- endif %}\n {%- endfor %}\n {% endset %}\n\n {% do adapter.add_query(sql, bindings=bindings, abridge_sql_log=True) %}\n\n {% if loop.index0 == 0 %}\n {% do statements.append(sql) %}\n {% endif %}\n {% endfor %}\n\n {# Return SQL so we can render it out into the compiled files #}\n {{ return(statements[0]) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_batch_size", "macro.dbt.get_seed_column_quoted_csv", "macro.dbt.get_binding_char"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5302458, "supported_languages": null}, "macro.dbt_duckdb.duckdb__snapshot_merge_sql": {"name": "duckdb__snapshot_merge_sql", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/snapshot_merge.sql", "original_file_path": "macros/snapshot_merge.sql", "unique_id": "macro.dbt_duckdb.duckdb__snapshot_merge_sql", "macro_sql": "{% macro duckdb__snapshot_merge_sql(target, source, insert_cols) -%}\n {%- set insert_cols_csv = insert_cols | join(', ') -%}\n\n {% set insert_sql %}\n insert into {{ target }} ({{ insert_cols_csv }})\n select {% for column in insert_cols -%}\n DBT_INTERNAL_SOURCE.{{ column }} {%- if not loop.last %}, {%- endif %}\n {%- endfor %}\n from {{ source }} as DBT_INTERNAL_SOURCE\n where DBT_INTERNAL_SOURCE.dbt_change_type = 'insert';\n {% endset %}\n\n {% do adapter.add_query(insert_sql, auto_begin=False) %}\n\n update {{ target }}\n set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to\n from {{ source }} as DBT_INTERNAL_SOURCE\n where DBT_INTERNAL_SOURCE.dbt_scd_id = {{ target.identifier }}.dbt_scd_id\n and DBT_INTERNAL_SOURCE.dbt_change_type = 'update'\n and {{ target.identifier }}.dbt_valid_to is null;\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5310872, "supported_languages": null}, "macro.dbt_duckdb.duckdb__get_catalog": {"name": "duckdb__get_catalog", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/catalog.sql", "original_file_path": "macros/catalog.sql", "unique_id": "macro.dbt_duckdb.duckdb__get_catalog", "macro_sql": "{% macro duckdb__get_catalog(information_schema, schemas) -%}\n {%- call statement('catalog', fetch_result=True) -%}\n select\n '{{ database }}' as table_database,\n t.table_schema,\n t.table_name,\n t.table_type,\n '' as table_comment,\n c.column_name,\n c.ordinal_position as column_index,\n c.data_type column_type,\n '' as column_comment,\n '' as table_owner\n FROM information_schema.tables t JOIN information_schema.columns c ON t.table_schema = c.table_schema AND t.table_name = c.table_name\n WHERE (\n {%- for schema in schemas -%}\n upper(t.table_schema) = upper('{{ schema }}'){%- if not loop.last %} or {% endif -%}\n {%- endfor -%}\n )\n AND t.table_type IN ('BASE TABLE', 'VIEW')\n ORDER BY\n t.table_schema,\n t.table_name,\n c.ordinal_position\n {%- endcall -%}\n {{ return(load_result('catalog').table) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.531669, "supported_languages": null}, "macro.dbt_duckdb.duckdb__create_schema": {"name": "duckdb__create_schema", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__create_schema", "macro_sql": "{% macro duckdb__create_schema(relation) -%}\n {%- call statement('create_schema') -%}\n create schema if not exists {{ relation.without_identifier().include(database=adapter.use_database()) }}\n {%- endcall -%}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.541016, "supported_languages": null}, "macro.dbt_duckdb.duckdb__drop_schema": {"name": "duckdb__drop_schema", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__drop_schema", "macro_sql": "{% macro duckdb__drop_schema(relation) -%}\n {%- call statement('drop_schema') -%}\n drop schema if exists {{ relation.without_identifier().include(database=adapter.use_database()) }} cascade\n {%- endcall -%}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.541246, "supported_languages": null}, "macro.dbt_duckdb.duckdb__list_schemas": {"name": "duckdb__list_schemas", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__list_schemas", "macro_sql": "{% macro duckdb__list_schemas(database) -%}\n {% set sql %}\n select schema_name\n from information_schema.schemata\n {% endset %}\n {{ return(run_query(sql)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.run_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.541417, "supported_languages": null}, "macro.dbt_duckdb.duckdb__check_schema_exists": {"name": "duckdb__check_schema_exists", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__check_schema_exists", "macro_sql": "{% macro duckdb__check_schema_exists(information_schema, schema) -%}\n {% set sql -%}\n select count(*)\n from information_schema.schemata\n where schema_name='{{ schema }}'\n {%- endset %}\n {{ return(run_query(sql)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.run_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.541627, "supported_languages": null}, "macro.dbt_duckdb.duckdb__create_table_as": {"name": "duckdb__create_table_as", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__create_table_as", "macro_sql": "{% macro duckdb__create_table_as(temporary, relation, compiled_code, language='sql') -%}\n {%- if language == 'sql' -%}\n {%- set sql_header = config.get('sql_header', none) -%}\n\n {{ sql_header if sql_header is not none }}\n\n create {% if temporary: -%}temporary{%- endif %} table\n {{ relation.include(database=(not temporary and adapter.use_database()), schema=(not temporary)) }}\n as (\n {{ compiled_code }}\n );\n {%- elif language == 'python' -%}\n {{ py_write_table(temporary=temporary, relation=relation, compiled_code=compiled_code) }}\n {%- else -%}\n {% do exceptions.raise_compiler_error(\"duckdb__create_table_as macro didn't get supported language, it got %s\" % language) %}\n {%- endif -%}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.py_write_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.542319, "supported_languages": null}, "macro.dbt_duckdb.py_write_table": {"name": "py_write_table", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.py_write_table", "macro_sql": "{% macro py_write_table(temporary, relation, compiled_code) -%}\n{{ compiled_code }}\n\ndef materialize(df, con):\n try:\n import pyarrow\n except ImportError:\n pass\n finally:\n if isinstance(df, pyarrow.Table):\n # https://github.com/duckdb/duckdb/issues/6584\n import pyarrow.dataset\n con.execute('create table {{ relation.include(database=adapter.use_database()) }} as select * from df')\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.542516, "supported_languages": null}, "macro.dbt_duckdb.duckdb__create_view_as": {"name": "duckdb__create_view_as", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__create_view_as", "macro_sql": "{% macro duckdb__create_view_as(relation, sql) -%}\n {%- set sql_header = config.get('sql_header', none) -%}\n\n {{ sql_header if sql_header is not none }}\n create view {{ relation.include(database=adapter.use_database()) }} as (\n {{ sql }}\n );\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.542791, "supported_languages": null}, "macro.dbt_duckdb.duckdb__get_columns_in_relation": {"name": "duckdb__get_columns_in_relation", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__get_columns_in_relation", "macro_sql": "{% macro duckdb__get_columns_in_relation(relation) -%}\n {% call statement('get_columns_in_relation', fetch_result=True) %}\n select\n column_name,\n data_type,\n character_maximum_length,\n numeric_precision,\n numeric_scale\n\n from information_schema.columns\n where table_name = '{{ relation.identifier }}'\n {% if relation.schema %}\n and table_schema = '{{ relation.schema }}'\n {% endif %}\n order by ordinal_position\n\n {% endcall %}\n {% set table = load_result('get_columns_in_relation').table %}\n {{ return(sql_convert_columns_in_relation(table)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement", "macro.dbt.sql_convert_columns_in_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.543176, "supported_languages": null}, "macro.dbt_duckdb.duckdb__list_relations_without_caching": {"name": "duckdb__list_relations_without_caching", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__list_relations_without_caching", "macro_sql": "{% macro duckdb__list_relations_without_caching(schema_relation) %}\n {% call statement('list_relations_without_caching', fetch_result=True) -%}\n select\n '{{ schema_relation.database }}' as database,\n table_name as name,\n table_schema as schema,\n CASE table_type\n WHEN 'BASE TABLE' THEN 'table'\n WHEN 'VIEW' THEN 'view'\n WHEN 'LOCAL TEMPORARY' THEN 'table'\n END as type\n from information_schema.tables\n where table_schema = '{{ schema_relation.schema }}'\n {% endcall %}\n {{ return(load_result('list_relations_without_caching').table) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.54345, "supported_languages": null}, "macro.dbt_duckdb.duckdb__drop_relation": {"name": "duckdb__drop_relation", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__drop_relation", "macro_sql": "{% macro duckdb__drop_relation(relation) -%}\n {% call statement('drop_relation', auto_begin=False) -%}\n drop {{ relation.type }} if exists {{ relation.include(database=adapter.use_database()) }} cascade\n {%- endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.543704, "supported_languages": null}, "macro.dbt_duckdb.duckdb__truncate_relation": {"name": "duckdb__truncate_relation", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__truncate_relation", "macro_sql": "{% macro duckdb__truncate_relation(relation) -%}\n {% call statement('truncate_relation') -%}\n DELETE FROM {{ relation.include(database=adapter.use_database()) }} WHERE 1=1\n {%- endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5438988, "supported_languages": null}, "macro.dbt_duckdb.duckdb__rename_relation": {"name": "duckdb__rename_relation", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__rename_relation", "macro_sql": "{% macro duckdb__rename_relation(from_relation, to_relation) -%}\n {% set target_name = adapter.quote_as_configured(to_relation.identifier, 'identifier') %}\n {% call statement('rename_relation') -%}\n alter {{ to_relation.type }} {{ from_relation }} rename to {{ target_name }}\n {%- endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.544176, "supported_languages": null}, "macro.dbt_duckdb.duckdb__make_temp_relation": {"name": "duckdb__make_temp_relation", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__make_temp_relation", "macro_sql": "{% macro duckdb__make_temp_relation(base_relation, suffix) %}\n {% set tmp_identifier = base_relation.identifier ~ suffix ~ py_current_timestring() %}\n {% do return(base_relation.incorporate(\n path={\n \"identifier\": tmp_identifier,\n \"schema\": none,\n \"database\": none\n })) -%}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.py_current_timestring"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.544481, "supported_languages": null}, "macro.dbt_duckdb.duckdb__current_timestamp": {"name": "duckdb__current_timestamp", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__current_timestamp", "macro_sql": "{% macro duckdb__current_timestamp() -%}\n now()\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5445492, "supported_languages": null}, "macro.dbt_duckdb.duckdb__snapshot_string_as_time": {"name": "duckdb__snapshot_string_as_time", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__snapshot_string_as_time", "macro_sql": "{% macro duckdb__snapshot_string_as_time(timestamp) -%}\n {%- set result = \"'\" ~ timestamp ~ \"'::timestamp\" -%}\n {{ return(result) }}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.544718, "supported_languages": null}, "macro.dbt_duckdb.duckdb__snapshot_get_time": {"name": "duckdb__snapshot_get_time", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__snapshot_get_time", "macro_sql": "{% macro duckdb__snapshot_get_time() -%}\n {{ current_timestamp() }}::timestamp\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.current_timestamp"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5448031, "supported_languages": null}, "macro.dbt_duckdb.duckdb__get_incremental_default_sql": {"name": "duckdb__get_incremental_default_sql", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__get_incremental_default_sql", "macro_sql": "{% macro duckdb__get_incremental_default_sql(arg_dict) %}\n {% do return(get_incremental_delete_insert_sql(arg_dict)) %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_incremental_delete_insert_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.545005, "supported_languages": null}, "macro.dbt_duckdb.duckdb__get_incremental_delete_insert_sql": {"name": "duckdb__get_incremental_delete_insert_sql", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__get_incremental_delete_insert_sql", "macro_sql": "{% macro duckdb__get_incremental_delete_insert_sql(arg_dict) %}\n {% do return(get_delete_insert_merge_sql(arg_dict[\"target_relation\"].include(database=adapter.use_database()), arg_dict[\"temp_relation\"], arg_dict[\"unique_key\"], arg_dict[\"dest_columns\"])) %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_delete_insert_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.54529, "supported_languages": null}, "macro.dbt_duckdb.duckdb__get_incremental_append_sql": {"name": "duckdb__get_incremental_append_sql", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.duckdb__get_incremental_append_sql", "macro_sql": "{% macro duckdb__get_incremental_append_sql(arg_dict) %}\n {% do return(get_insert_into_sql(arg_dict[\"target_relation\"].include(database=adapter.use_database()), arg_dict[\"temp_relation\"], arg_dict[\"dest_columns\"])) %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_insert_into_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5455391, "supported_languages": null}, "macro.dbt_duckdb.location_exists": {"name": "location_exists", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.location_exists", "macro_sql": "{% macro location_exists(location) -%}\n {% do return(adapter.location_exists(location)) %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.545696, "supported_languages": null}, "macro.dbt_duckdb.write_to_file": {"name": "write_to_file", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.write_to_file", "macro_sql": "{% macro write_to_file(relation, location, options) -%}\n {% call statement('write_to_file') -%}\n copy {{ relation }} to '{{ location }}' ({{ options }})\n {%- endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.54589, "supported_languages": null}, "macro.dbt_duckdb.register_glue_table": {"name": "register_glue_table", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.register_glue_table", "macro_sql": "{% macro register_glue_table(register, glue_database, relation, location, format) -%}\n {% if location.startswith(\"s3://\") and register == true %}\n {%- set column_list = adapter.get_columns_in_relation(relation) -%}\n {% do adapter.register_glue_table(glue_database, relation.identifier, column_list, location, format) %}\n {% endif %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5462441, "supported_languages": null}, "macro.dbt_duckdb.render_write_options": {"name": "render_write_options", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/adapters.sql", "original_file_path": "macros/adapters.sql", "unique_id": "macro.dbt_duckdb.render_write_options", "macro_sql": "{% macro render_write_options(config) -%}\n {% set options = config.get('options', {}) %}\n {% for k in options %}\n {% if options[k] is string %}\n {% set _ = options.update({k: render(options[k])}) %}\n {% else %}\n {% set _ = options.update({k: render(options[k])}) %}\n {% endif %}\n {% endfor %}\n\n {# legacy top-level write options #}\n {% if config.get('format') %}\n {% set _ = options.update({'format': render(config.get('format'))}) %}\n {% endif %}\n {% if config.get('delimiter') %}\n {% set _ = options.update({'delimiter': render(config.get('delimiter'))}) %}\n {% endif %}\n\n {% do return(options) %}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.54719, "supported_languages": null}, "macro.dbt_duckdb.materialization_table_duckdb": {"name": "materialization_table_duckdb", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/materializations/table.sql", "original_file_path": "macros/materializations/table.sql", "unique_id": "macro.dbt_duckdb.materialization_table_duckdb", "macro_sql": "{% materialization table, adapter=\"duckdb\", supported_languages=['sql', 'python'] %}\n\n {%- set language = model['language'] -%}\n\n {%- set existing_relation = load_cached_relation(this) -%}\n {%- set target_relation = this.incorporate(type='table') %}\n {%- set intermediate_relation = make_intermediate_relation(target_relation) -%}\n -- the intermediate_relation should not already exist in the database; get_relation\n -- will return None in that case. Otherwise, we get a relation that we can drop\n -- later, before we try to use this name for the current operation\n {%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation) -%}\n /*\n See ../view/view.sql for more information about this relation.\n */\n {%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}\n {%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}\n -- as above, the backup_relation should not already exist\n {%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}\n -- grab current tables grants config for comparision later on\n {% set grant_config = config.get('grants') %}\n\n -- drop the temp relations if they exist already in the database\n {{ drop_relation_if_exists(preexisting_intermediate_relation) }}\n {{ drop_relation_if_exists(preexisting_backup_relation) }}\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n -- `BEGIN` happens here:\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n -- build model\n {% call statement('main', language=language) -%}\n {{- create_table_as(False, intermediate_relation, compiled_code, language) }}\n {%- endcall %}\n\n -- cleanup\n {% if existing_relation is not none %}\n {{ adapter.rename_relation(existing_relation, backup_relation) }}\n {% endif %}\n\n {{ adapter.rename_relation(intermediate_relation, target_relation) }}\n\n {% do create_indexes(target_relation) %}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n {% set should_revoke = should_revoke(existing_relation, full_refresh_mode=True) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n -- `COMMIT` happens here\n {{ adapter.commit() }}\n\n -- finally, drop the existing/backup relation after the commit\n {{ drop_relation_if_exists(backup_relation) }}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n{% endmaterialization %}", "depends_on": {"macros": ["macro.dbt.load_cached_relation", "macro.dbt.make_intermediate_relation", "macro.dbt.make_backup_relation", "macro.dbt.drop_relation_if_exists", "macro.dbt.run_hooks", "macro.dbt.statement", "macro.dbt.create_table_as", "macro.dbt.create_indexes", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.549718, "supported_languages": ["sql", "python"]}, "macro.dbt_duckdb.materialization_external_duckdb": {"name": "materialization_external_duckdb", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/materializations/external.sql", "original_file_path": "macros/materializations/external.sql", "unique_id": "macro.dbt_duckdb.materialization_external_duckdb", "macro_sql": "{% materialization external, adapter=\"duckdb\", supported_languages=['sql', 'python'] %}\n\n {%- set location = render(config.get('location', default=external_location(this, config))) -%})\n {%- set rendered_options = render_write_options(config) -%}\n {%- set write_options = adapter.external_write_options(location, rendered_options) -%}\n {%- set read_location = adapter.external_read_location(location, rendered_options) -%}\n\n -- set language - python or sql\n {%- set language = model['language'] -%}\n\n {%- set target_relation = this.incorporate(type='view') %}\n\n -- Continue as normal materialization\n {%- set existing_relation = load_cached_relation(this) -%}\n {%- set temp_relation = make_intermediate_relation(this.incorporate(type='table'), suffix='__dbt_tmp') -%}\n {%- set intermediate_relation = make_intermediate_relation(target_relation, suffix='__dbt_int') -%}\n -- the intermediate_relation should not already exist in the database; get_relation\n -- will return None in that case. Otherwise, we get a relation that we can drop\n -- later, before we try to use this name for the current operation\n {%- set preexisting_temp_relation = load_cached_relation(temp_relation) -%}\n {%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation) -%}\n /*\n See ../view/view.sql for more information about this relation.\n */\n {%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}\n {%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}\n -- as above, the backup_relation should not already exist\n {%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}\n -- grab current tables grants config for comparision later on\n {% set grant_config = config.get('grants') %}\n\n -- drop the temp relations if they exist already in the database\n {{ drop_relation_if_exists(preexisting_intermediate_relation) }}\n {{ drop_relation_if_exists(preexisting_temp_relation) }}\n {{ drop_relation_if_exists(preexisting_backup_relation) }}\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n -- `BEGIN` happens here:\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n -- build model\n {% call statement('create_table', language=language) -%}\n {{- create_table_as(False, temp_relation, compiled_code, language) }}\n {%- endcall %}\n\n -- write an temp relation into file\n {{ write_to_file(temp_relation, location, write_options) }}\n -- create a view on top of the location\n {% call statement('main', language='sql') -%}\n create or replace view {{ intermediate_relation.include(database=adapter.use_database()) }} as (\n select * from '{{ read_location }}'\n );\n {%- endcall %}\n\n -- cleanup\n {% if existing_relation is not none %}\n {{ adapter.rename_relation(existing_relation, backup_relation) }}\n {% endif %}\n\n {{ adapter.rename_relation(intermediate_relation, target_relation) }}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n {% set should_revoke = should_revoke(existing_relation, full_refresh_mode=True) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n -- `COMMIT` happens here\n {{ adapter.commit() }}\n\n -- finally, drop the existing/backup relation after the commit\n {{ drop_relation_if_exists(backup_relation) }}\n {{ drop_relation_if_exists(temp_relation) }}\n\n -- register table into glue\n {%- set glue_register = config.get('glue_register', default=false) -%}\n {%- set glue_database = render(config.get('glue_database', default='default')) -%}\n {% do register_glue_table(glue_register, glue_database, target_relation, location, format) %}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n\n{% endmaterialization %}", "depends_on": {"macros": ["macro.dbt_duckdb.external_location", "macro.dbt_duckdb.render_write_options", "macro.dbt.load_cached_relation", "macro.dbt.make_intermediate_relation", "macro.dbt.make_backup_relation", "macro.dbt.drop_relation_if_exists", "macro.dbt.run_hooks", "macro.dbt.statement", "macro.dbt.create_table_as", "macro.dbt_duckdb.write_to_file", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs", "macro.dbt_duckdb.register_glue_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.55407, "supported_languages": ["sql", "python"]}, "macro.dbt_duckdb.materialization_incremental_duckdb": {"name": "materialization_incremental_duckdb", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/materializations/incremental.sql", "original_file_path": "macros/materializations/incremental.sql", "unique_id": "macro.dbt_duckdb.materialization_incremental_duckdb", "macro_sql": "{% materialization incremental, adapter=\"duckdb\", supported_languages=['sql', 'python'] -%}\n\n {%- set language = model['language'] -%}\n\n -- relations\n {%- set existing_relation = load_cached_relation(this) -%}\n {%- set target_relation = this.incorporate(type='table') -%}\n {%- set temp_relation = make_temp_relation(target_relation)-%}\n {%- set intermediate_relation = make_intermediate_relation(target_relation)-%}\n {%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}\n {%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}\n\n -- configs\n {%- set unique_key = config.get('unique_key') -%}\n {%- set full_refresh_mode = (should_full_refresh() or existing_relation.is_view) -%}\n {%- set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') -%}\n\n -- the temp_ and backup_ relations should not already exist in the database; get_relation\n -- will return None in that case. Otherwise, we get a relation that we can drop\n -- later, before we try to use this name for the current operation. This has to happen before\n -- BEGIN, in a separate transaction\n {%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation)-%}\n {%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}\n -- grab current tables grants config for comparision later on\n {% set grant_config = config.get('grants') %}\n {{ drop_relation_if_exists(preexisting_intermediate_relation) }}\n {{ drop_relation_if_exists(preexisting_backup_relation) }}\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n -- `BEGIN` happens here:\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n {% set to_drop = [] %}\n\n {% if existing_relation is none %}\n {% set build_sql = create_table_as(False, target_relation, compiled_code, language) %}\n {% elif full_refresh_mode %}\n {% set build_sql = create_table_as(False, intermediate_relation, compiled_code, language) %}\n {% set need_swap = true %}\n {% else %}\n {% if language == 'python' %}\n {% set build_python = create_table_as(False, temp_relation, compiled_code, language) %}\n {% call statement(\"pre\", language=language) %}\n {{- build_python }}\n {% endcall %}\n {% else %} {# SQL #}\n {% do run_query(create_table_as(True, temp_relation, compiled_code, language)) %}\n {% endif %}\n {% do adapter.expand_target_column_types(\n from_relation=temp_relation,\n to_relation=target_relation) %}\n {#-- Process schema changes. Returns dict of changes if successful. Use source columns for upserting/merging --#}\n {% set dest_columns = process_schema_changes(on_schema_change, temp_relation, existing_relation) %}\n {% if not dest_columns %}\n {% set dest_columns = adapter.get_columns_in_relation(existing_relation) %}\n {% endif %}\n\n {#-- Get the incremental_strategy, the macro to use for the strategy, and build the sql --#}\n {% set incremental_strategy = config.get('incremental_strategy') or 'default' %}\n {% set incremental_predicates = config.get('incremental_predicates', none) %}\n {% set strategy_sql_macro_func = adapter.get_incremental_strategy_macro(context, incremental_strategy) %}\n {% set strategy_arg_dict = ({'target_relation': target_relation, 'temp_relation': temp_relation, 'unique_key': unique_key, 'dest_columns': dest_columns, 'predicates': incremental_predicates }) %}\n {% set build_sql = strategy_sql_macro_func(strategy_arg_dict) %}\n {% set language = \"sql\" %}\n\n {% endif %}\n\n {% call statement(\"main\", language=language) %}\n {{- build_sql }}\n {% endcall %}\n\n {% if need_swap %}\n {% do adapter.rename_relation(target_relation, backup_relation) %}\n {% do adapter.rename_relation(intermediate_relation, target_relation) %}\n {% do to_drop.append(backup_relation) %}\n {% endif %}\n\n {% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n {% if existing_relation is none or existing_relation.is_view or should_full_refresh() %}\n {% do create_indexes(target_relation) %}\n {% endif %}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n -- `COMMIT` happens here\n {% do adapter.commit() %}\n\n {% for rel in to_drop %}\n {% do adapter.drop_relation(rel) %}\n {% endfor %}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n\n{%- endmaterialization %}", "depends_on": {"macros": ["macro.dbt.load_cached_relation", "macro.dbt.make_temp_relation", "macro.dbt.make_intermediate_relation", "macro.dbt.make_backup_relation", "macro.dbt.should_full_refresh", "macro.dbt.incremental_validate_on_schema_change", "macro.dbt.drop_relation_if_exists", "macro.dbt.run_hooks", "macro.dbt.create_table_as", "macro.dbt.statement", "macro.dbt.run_query", "macro.dbt.process_schema_changes", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs", "macro.dbt.create_indexes"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.559038, "supported_languages": ["sql", "python"]}, "macro.dbt_duckdb.duckdb__dateadd": {"name": "duckdb__dateadd", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/dateadd.sql", "original_file_path": "macros/utils/dateadd.sql", "unique_id": "macro.dbt_duckdb.duckdb__dateadd", "macro_sql": "{% macro duckdb__dateadd(datepart, interval, from_date_or_timestamp) %}\n\n {{ from_date_or_timestamp }} + ((interval '1 {{ datepart }}') * ({{ interval }}))\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5592582, "supported_languages": null}, "macro.dbt_duckdb.duckdb__listagg": {"name": "duckdb__listagg", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/listagg.sql", "original_file_path": "macros/utils/listagg.sql", "unique_id": "macro.dbt_duckdb.duckdb__listagg", "macro_sql": "{% macro duckdb__listagg(measure, delimiter_text, order_by_clause, limit_num) -%}\n {% if limit_num -%}\n list_aggr(\n (array_agg(\n {{ measure }}\n {% if order_by_clause -%}\n {{ order_by_clause }}\n {%- endif %}\n ))[1:{{ limit_num }}],\n 'string_agg',\n {{ delimiter_text }}\n )\n {%- else %}\n string_agg(\n {{ measure }},\n {{ delimiter_text }}\n {% if order_by_clause -%}\n {{ order_by_clause }}\n {%- endif %}\n )\n {%- endif %}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.559789, "supported_languages": null}, "macro.dbt_duckdb.duckdb__datediff": {"name": "duckdb__datediff", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/datediff.sql", "original_file_path": "macros/utils/datediff.sql", "unique_id": "macro.dbt_duckdb.duckdb__datediff", "macro_sql": "{% macro duckdb__datediff(first_date, second_date, datepart) -%}\n\n {% if datepart == 'year' %}\n (date_part('year', ({{second_date}})::date) - date_part('year', ({{first_date}})::date))\n {% elif datepart == 'quarter' %}\n ({{ datediff(first_date, second_date, 'year') }} * 4 + date_part('quarter', ({{second_date}})::date) - date_part('quarter', ({{first_date}})::date))\n {% elif datepart == 'month' %}\n ({{ datediff(first_date, second_date, 'year') }} * 12 + date_part('month', ({{second_date}})::date) - date_part('month', ({{first_date}})::date))\n {% elif datepart == 'day' %}\n (({{second_date}})::date - ({{first_date}})::date)\n {% elif datepart == 'week' %}\n ({{ datediff(first_date, second_date, 'day') }} / 7 + case\n when date_part('dow', ({{first_date}})::timestamp) <= date_part('dow', ({{second_date}})::timestamp) then\n case when {{first_date}} <= {{second_date}} then 0 else -1 end\n else\n case when {{first_date}} <= {{second_date}} then 1 else 0 end\n end)\n {% elif datepart == 'hour' %}\n ({{ datediff(first_date, second_date, 'day') }} * 24 + date_part('hour', ({{second_date}})::timestamp) - date_part('hour', ({{first_date}})::timestamp))\n {% elif datepart == 'minute' %}\n ({{ datediff(first_date, second_date, 'hour') }} * 60 + date_part('minute', ({{second_date}})::timestamp) - date_part('minute', ({{first_date}})::timestamp))\n {% elif datepart == 'second' %}\n ({{ datediff(first_date, second_date, 'minute') }} * 60 + floor(date_part('second', ({{second_date}})::timestamp)) - floor(date_part('second', ({{first_date}})::timestamp)))\n {% elif datepart == 'millisecond' %}\n ({{ datediff(first_date, second_date, 'minute') }} * 60000 + floor(date_part('millisecond', ({{second_date}})::timestamp)) - floor(date_part('millisecond', ({{first_date}})::timestamp)))\n {% elif datepart == 'microsecond' %}\n ({{ datediff(first_date, second_date, 'minute') }} * 60000000 + floor(date_part('microsecond', ({{second_date}})::timestamp)) - floor(date_part('microsecond', ({{first_date}})::timestamp)))\n {% else %}\n {{ exceptions.raise_compiler_error(\"Unsupported datepart for macro datediff in postgres: {!r}\".format(datepart)) }}\n {% endif %}\n\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.datediff"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.562541, "supported_languages": null}, "macro.dbt_duckdb.duckdb__any_value": {"name": "duckdb__any_value", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/any_value.sql", "original_file_path": "macros/utils/any_value.sql", "unique_id": "macro.dbt_duckdb.duckdb__any_value", "macro_sql": "{% macro duckdb__any_value(expression) -%}\n\n arbitrary({{ expression }})\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.562667, "supported_languages": null}, "macro.dbt_duckdb.register_upstream_external_models": {"name": "register_upstream_external_models", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/upstream.sql", "original_file_path": "macros/utils/upstream.sql", "unique_id": "macro.dbt_duckdb.register_upstream_external_models", "macro_sql": "{%- macro register_upstream_external_models() -%}\n{% if execute %}\n{% set upstream_nodes = {} %}\n{% set upstream_schemas = {} %}\n{% for node in selected_resources %}\n {% for upstream_node in graph['nodes'][node]['depends_on']['nodes'] %}\n {% if upstream_node not in upstream_nodes and upstream_node not in selected_resources %}\n {% do upstream_nodes.update({upstream_node: None}) %}\n {% set upstream = graph['nodes'].get(upstream_node) %}\n {% if upstream\n and upstream.resource_type in ('model', 'seed')\n and upstream.config.materialized=='external'\n %}\n {%- set upstream_rel = api.Relation.create(\n database=upstream['database'],\n schema=upstream['schema'],\n identifier=upstream['alias']\n ) -%}\n {%- set location = upstream.config.get('location', external_location(upstream, upstream.config)) -%}\n {%- set rendered_options = render_write_options(config) -%}\n {%- set upstream_location = adapter.external_read_location(location, rendered_options) -%}\n {% if upstream_rel.schema not in upstream_schemas %}\n {% call statement('main', language='sql') -%}\n create schema if not exists {{ upstream_rel.schema }}\n {%- endcall %}\n {% do upstream_schemas.update({upstream_rel.schema: None}) %}\n {% endif %}\n {% call statement('main', language='sql') -%}\n create or replace view {{ upstream_rel.include(database=adapter.use_database()) }} as (\n select * from '{{ upstream_location }}'\n );\n {%- endcall %}\n {%- endif %}\n {% endif %}\n {% endfor %}\n{% endfor %}\n{% do adapter.commit() %}\n{% endif %}\n{%- endmacro -%}", "depends_on": {"macros": ["macro.dbt_duckdb.external_location", "macro.dbt_duckdb.render_write_options", "macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.564817, "supported_languages": null}, "macro.dbt_duckdb.duckdb__split_part": {"name": "duckdb__split_part", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/splitpart.sql", "original_file_path": "macros/utils/splitpart.sql", "unique_id": "macro.dbt_duckdb.duckdb__split_part", "macro_sql": "{% macro duckdb__split_part(string_text, delimiter_text, part_number) %}\n\n {% if part_number >= 0 %}\n coalesce(string_split({{ string_text }}, {{ delimiter_text }})[ {{ part_number }} ], '')\n {% else %}\n {{ dbt._split_part_negative(string_text, delimiter_text, part_number) }}\n {% endif %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt._split_part_negative"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.565182, "supported_languages": null}, "macro.dbt_duckdb.duckdb__last_day": {"name": "duckdb__last_day", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/lastday.sql", "original_file_path": "macros/utils/lastday.sql", "unique_id": "macro.dbt_duckdb.duckdb__last_day", "macro_sql": "{% macro duckdb__last_day(date, datepart) -%}\n\n {%- if datepart == 'quarter' -%}\n -- duckdb dateadd does not support quarter interval.\n cast(\n {{dbt.dateadd('day', '-1',\n dbt.dateadd('month', '3', dbt.date_trunc(datepart, date))\n )}}\n as date)\n {%- else -%}\n {{dbt.default_last_day(date, datepart)}}\n {%- endif -%}\n\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.dateadd", "macro.dbt.date_trunc", "macro.dbt.default_last_day"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.565612, "supported_languages": null}, "macro.dbt_duckdb.external_location": {"name": "external_location", "resource_type": "macro", "package_name": "dbt_duckdb", "path": "macros/utils/external_location.sql", "original_file_path": "macros/utils/external_location.sql", "unique_id": "macro.dbt_duckdb.external_location", "macro_sql": "{%- macro external_location(relation, config) -%}\n {%- if config.get('options', {}).get('partition_by') is none -%}\n {%- set format = config.get('format', 'parquet') -%}\n {{- adapter.external_root() }}/{{ relation.identifier }}.{{ format }}\n {%- else -%}\n {{- adapter.external_root() }}/{{ relation.identifier }}\n {%- endif -%}\n{%- endmacro -%}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.566093, "supported_languages": null}, "macro.dbt.run_hooks": {"name": "run_hooks", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/hooks.sql", "original_file_path": "macros/materializations/hooks.sql", "unique_id": "macro.dbt.run_hooks", "macro_sql": "{% macro run_hooks(hooks, inside_transaction=True) %}\n {% for hook in hooks | selectattr('transaction', 'equalto', inside_transaction) %}\n {% if not inside_transaction and loop.first %}\n {% call statement(auto_begin=inside_transaction) %}\n commit;\n {% endcall %}\n {% endif %}\n {% set rendered = render(hook.get('sql')) | trim %}\n {% if (rendered | length) > 0 %}\n {% call statement(auto_begin=inside_transaction) %}\n {{ rendered }}\n {% endcall %}\n {% endif %}\n {% endfor %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.567016, "supported_languages": null}, "macro.dbt.make_hook_config": {"name": "make_hook_config", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/hooks.sql", "original_file_path": "macros/materializations/hooks.sql", "unique_id": "macro.dbt.make_hook_config", "macro_sql": "{% macro make_hook_config(sql, inside_transaction) %}\n {{ tojson({\"sql\": sql, \"transaction\": inside_transaction}) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5671868, "supported_languages": null}, "macro.dbt.before_begin": {"name": "before_begin", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/hooks.sql", "original_file_path": "macros/materializations/hooks.sql", "unique_id": "macro.dbt.before_begin", "macro_sql": "{% macro before_begin(sql) %}\n {{ make_hook_config(sql, inside_transaction=False) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.make_hook_config"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.567308, "supported_languages": null}, "macro.dbt.in_transaction": {"name": "in_transaction", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/hooks.sql", "original_file_path": "macros/materializations/hooks.sql", "unique_id": "macro.dbt.in_transaction", "macro_sql": "{% macro in_transaction(sql) %}\n {{ make_hook_config(sql, inside_transaction=True) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.make_hook_config"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.567431, "supported_languages": null}, "macro.dbt.after_commit": {"name": "after_commit", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/hooks.sql", "original_file_path": "macros/materializations/hooks.sql", "unique_id": "macro.dbt.after_commit", "macro_sql": "{% macro after_commit(sql) %}\n {{ make_hook_config(sql, inside_transaction=False) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.make_hook_config"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.567552, "supported_languages": null}, "macro.dbt.set_sql_header": {"name": "set_sql_header", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/configs.sql", "original_file_path": "macros/materializations/configs.sql", "unique_id": "macro.dbt.set_sql_header", "macro_sql": "{% macro set_sql_header(config) -%}\n {{ config.set('sql_header', caller()) }}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.567872, "supported_languages": null}, "macro.dbt.should_full_refresh": {"name": "should_full_refresh", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/configs.sql", "original_file_path": "macros/materializations/configs.sql", "unique_id": "macro.dbt.should_full_refresh", "macro_sql": "{% macro should_full_refresh() %}\n {% set config_full_refresh = config.get('full_refresh') %}\n {% if config_full_refresh is none %}\n {% set config_full_refresh = flags.FULL_REFRESH %}\n {% endif %}\n {% do return(config_full_refresh) %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.568131, "supported_languages": null}, "macro.dbt.should_store_failures": {"name": "should_store_failures", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/configs.sql", "original_file_path": "macros/materializations/configs.sql", "unique_id": "macro.dbt.should_store_failures", "macro_sql": "{% macro should_store_failures() %}\n {% set config_store_failures = config.get('store_failures') %}\n {% if config_store_failures is none %}\n {% set config_store_failures = flags.STORE_FAILURES %}\n {% endif %}\n {% do return(config_store_failures) %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.568392, "supported_languages": null}, "macro.dbt.snapshot_merge_sql": {"name": "snapshot_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/snapshot_merge.sql", "original_file_path": "macros/materializations/snapshots/snapshot_merge.sql", "unique_id": "macro.dbt.snapshot_merge_sql", "macro_sql": "{% macro snapshot_merge_sql(target, source, insert_cols) -%}\n {{ adapter.dispatch('snapshot_merge_sql', 'dbt')(target, source, insert_cols) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__snapshot_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.568773, "supported_languages": null}, "macro.dbt.default__snapshot_merge_sql": {"name": "default__snapshot_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/snapshot_merge.sql", "original_file_path": "macros/materializations/snapshots/snapshot_merge.sql", "unique_id": "macro.dbt.default__snapshot_merge_sql", "macro_sql": "{% macro default__snapshot_merge_sql(target, source, insert_cols) -%}\n {%- set insert_cols_csv = insert_cols | join(', ') -%}\n\n merge into {{ target }} as DBT_INTERNAL_DEST\n using {{ source }} as DBT_INTERNAL_SOURCE\n on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id\n\n when matched\n and DBT_INTERNAL_DEST.dbt_valid_to is null\n and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')\n then update\n set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to\n\n when not matched\n and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'\n then insert ({{ insert_cols_csv }})\n values ({{ insert_cols_csv }})\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.569004, "supported_languages": null}, "macro.dbt.strategy_dispatch": {"name": "strategy_dispatch", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.strategy_dispatch", "macro_sql": "{% macro strategy_dispatch(name) -%}\n{% set original_name = name %}\n {% if '.' in name %}\n {% set package_name, name = name.split(\".\", 1) %}\n {% else %}\n {% set package_name = none %}\n {% endif %}\n\n {% if package_name is none %}\n {% set package_context = context %}\n {% elif package_name in context %}\n {% set package_context = context[package_name] %}\n {% else %}\n {% set error_msg %}\n Could not find package '{{package_name}}', called with '{{original_name}}'\n {% endset %}\n {{ exceptions.raise_compiler_error(error_msg | trim) }}\n {% endif %}\n\n {%- set search_name = 'snapshot_' ~ name ~ '_strategy' -%}\n\n {% if search_name not in package_context %}\n {% set error_msg %}\n The specified strategy macro '{{name}}' was not found in package '{{ package_name }}'\n {% endset %}\n {{ exceptions.raise_compiler_error(error_msg | trim) }}\n {% endif %}\n {{ return(package_context[search_name]) }}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.572018, "supported_languages": null}, "macro.dbt.snapshot_hash_arguments": {"name": "snapshot_hash_arguments", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.snapshot_hash_arguments", "macro_sql": "{% macro snapshot_hash_arguments(args) -%}\n {{ adapter.dispatch('snapshot_hash_arguments', 'dbt')(args) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__snapshot_hash_arguments"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.572164, "supported_languages": null}, "macro.dbt.default__snapshot_hash_arguments": {"name": "default__snapshot_hash_arguments", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.default__snapshot_hash_arguments", "macro_sql": "{% macro default__snapshot_hash_arguments(args) -%}\n md5({%- for arg in args -%}\n coalesce(cast({{ arg }} as varchar ), '')\n {% if not loop.last %} || '|' || {% endif %}\n {%- endfor -%})\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5723531, "supported_languages": null}, "macro.dbt.snapshot_timestamp_strategy": {"name": "snapshot_timestamp_strategy", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.snapshot_timestamp_strategy", "macro_sql": "{% macro snapshot_timestamp_strategy(node, snapshotted_rel, current_rel, config, target_exists) %}\n {% set primary_key = config['unique_key'] %}\n {% set updated_at = config['updated_at'] %}\n {% set invalidate_hard_deletes = config.get('invalidate_hard_deletes', false) %}\n\n {#/*\n The snapshot relation might not have an {{ updated_at }} value if the\n snapshot strategy is changed from `check` to `timestamp`. We\n should use a dbt-created column for the comparison in the snapshot\n table instead of assuming that the user-supplied {{ updated_at }}\n will be present in the historical data.\n\n See https://github.com/dbt-labs/dbt-core/issues/2350\n */ #}\n {% set row_changed_expr -%}\n ({{ snapshotted_rel }}.dbt_valid_from < {{ current_rel }}.{{ updated_at }})\n {%- endset %}\n\n {% set scd_id_expr = snapshot_hash_arguments([primary_key, updated_at]) %}\n\n {% do return({\n \"unique_key\": primary_key,\n \"updated_at\": updated_at,\n \"row_changed\": row_changed_expr,\n \"scd_id\": scd_id_expr,\n \"invalidate_hard_deletes\": invalidate_hard_deletes\n }) %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.snapshot_hash_arguments"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.572973, "supported_languages": null}, "macro.dbt.snapshot_string_as_time": {"name": "snapshot_string_as_time", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.snapshot_string_as_time", "macro_sql": "{% macro snapshot_string_as_time(timestamp) -%}\n {{ adapter.dispatch('snapshot_string_as_time', 'dbt')(timestamp) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__snapshot_string_as_time"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.573117, "supported_languages": null}, "macro.dbt.default__snapshot_string_as_time": {"name": "default__snapshot_string_as_time", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.default__snapshot_string_as_time", "macro_sql": "{% macro default__snapshot_string_as_time(timestamp) %}\n {% do exceptions.raise_not_implemented(\n 'snapshot_string_as_time macro not implemented for adapter '+adapter.type()\n ) %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.573271, "supported_languages": null}, "macro.dbt.snapshot_check_all_get_existing_columns": {"name": "snapshot_check_all_get_existing_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.snapshot_check_all_get_existing_columns", "macro_sql": "{% macro snapshot_check_all_get_existing_columns(node, target_exists, check_cols_config) -%}\n {%- if not target_exists -%}\n {#-- no table yet -> return whatever the query does --#}\n {{ return((false, query_columns)) }}\n {%- endif -%}\n\n {#-- handle any schema changes --#}\n {%- set target_relation = adapter.get_relation(database=node.database, schema=node.schema, identifier=node.alias) -%}\n\n {% if check_cols_config == 'all' %}\n {%- set query_columns = get_columns_in_query(node['compiled_code']) -%}\n\n {% elif check_cols_config is iterable and (check_cols_config | length) > 0 %}\n {#-- query for proper casing/quoting, to support comparison below --#}\n {%- set select_check_cols_from_target -%}\n select {{ check_cols_config | join(', ') }} from ({{ node['compiled_code'] }}) subq\n {%- endset -%}\n {% set query_columns = get_columns_in_query(select_check_cols_from_target) %}\n\n {% else %}\n {% do exceptions.raise_compiler_error(\"Invalid value for 'check_cols': \" ~ check_cols_config) %}\n {% endif %}\n\n {%- set existing_cols = adapter.get_columns_in_relation(target_relation) | map(attribute = 'name') | list -%}\n {%- set ns = namespace() -%} {#-- handle for-loop scoping with a namespace --#}\n {%- set ns.column_added = false -%}\n\n {%- set intersection = [] -%}\n {%- for col in query_columns -%}\n {%- if col in existing_cols -%}\n {%- do intersection.append(adapter.quote(col)) -%}\n {%- else -%}\n {% set ns.column_added = true %}\n {%- endif -%}\n {%- endfor -%}\n {{ return((ns.column_added, intersection)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.get_columns_in_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.574463, "supported_languages": null}, "macro.dbt.snapshot_check_strategy": {"name": "snapshot_check_strategy", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/strategies.sql", "original_file_path": "macros/materializations/snapshots/strategies.sql", "unique_id": "macro.dbt.snapshot_check_strategy", "macro_sql": "{% macro snapshot_check_strategy(node, snapshotted_rel, current_rel, config, target_exists) %}\n {% set check_cols_config = config['check_cols'] %}\n {% set primary_key = config['unique_key'] %}\n {% set invalidate_hard_deletes = config.get('invalidate_hard_deletes', false) %}\n {% set updated_at = config.get('updated_at', snapshot_get_time()) %}\n\n {% set column_added = false %}\n\n {% set column_added, check_cols = snapshot_check_all_get_existing_columns(node, target_exists, check_cols_config) %}\n\n {%- set row_changed_expr -%}\n (\n {%- if column_added -%}\n {{ get_true_sql() }}\n {%- else -%}\n {%- for col in check_cols -%}\n {{ snapshotted_rel }}.{{ col }} != {{ current_rel }}.{{ col }}\n or\n (\n (({{ snapshotted_rel }}.{{ col }} is null) and not ({{ current_rel }}.{{ col }} is null))\n or\n ((not {{ snapshotted_rel }}.{{ col }} is null) and ({{ current_rel }}.{{ col }} is null))\n )\n {%- if not loop.last %} or {% endif -%}\n {%- endfor -%}\n {%- endif -%}\n )\n {%- endset %}\n\n {% set scd_id_expr = snapshot_hash_arguments([primary_key, updated_at]) %}\n\n {% do return({\n \"unique_key\": primary_key,\n \"updated_at\": updated_at,\n \"row_changed\": row_changed_expr,\n \"scd_id\": scd_id_expr,\n \"invalidate_hard_deletes\": invalidate_hard_deletes\n }) %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.snapshot_get_time", "macro.dbt.snapshot_check_all_get_existing_columns", "macro.dbt.get_true_sql", "macro.dbt.snapshot_hash_arguments"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5756218, "supported_languages": null}, "macro.dbt.create_columns": {"name": "create_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.create_columns", "macro_sql": "{% macro create_columns(relation, columns) %}\n {{ adapter.dispatch('create_columns', 'dbt')(relation, columns) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__create_columns"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.579034, "supported_languages": null}, "macro.dbt.default__create_columns": {"name": "default__create_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.default__create_columns", "macro_sql": "{% macro default__create_columns(relation, columns) %}\n {% for column in columns %}\n {% call statement() %}\n alter table {{ relation }} add column \"{{ column.name }}\" {{ column.data_type }};\n {% endcall %}\n {% endfor %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.579274, "supported_languages": null}, "macro.dbt.post_snapshot": {"name": "post_snapshot", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.post_snapshot", "macro_sql": "{% macro post_snapshot(staging_relation) %}\n {{ adapter.dispatch('post_snapshot', 'dbt')(staging_relation) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__post_snapshot"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.579415, "supported_languages": null}, "macro.dbt.default__post_snapshot": {"name": "default__post_snapshot", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.default__post_snapshot", "macro_sql": "{% macro default__post_snapshot(staging_relation) %}\n {# no-op #}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.579489, "supported_languages": null}, "macro.dbt.get_true_sql": {"name": "get_true_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.get_true_sql", "macro_sql": "{% macro get_true_sql() %}\n {{ adapter.dispatch('get_true_sql', 'dbt')() }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_true_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.579612, "supported_languages": null}, "macro.dbt.default__get_true_sql": {"name": "default__get_true_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.default__get_true_sql", "macro_sql": "{% macro default__get_true_sql() %}\n {{ return('TRUE') }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.579707, "supported_languages": null}, "macro.dbt.snapshot_staging_table": {"name": "snapshot_staging_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.snapshot_staging_table", "macro_sql": "{% macro snapshot_staging_table(strategy, source_sql, target_relation) -%}\n {{ adapter.dispatch('snapshot_staging_table', 'dbt')(strategy, source_sql, target_relation) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__snapshot_staging_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.579885, "supported_languages": null}, "macro.dbt.default__snapshot_staging_table": {"name": "default__snapshot_staging_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.default__snapshot_staging_table", "macro_sql": "{% macro default__snapshot_staging_table(strategy, source_sql, target_relation) -%}\n\n with snapshot_query as (\n\n {{ source_sql }}\n\n ),\n\n snapshotted_data as (\n\n select *,\n {{ strategy.unique_key }} as dbt_unique_key\n\n from {{ target_relation }}\n where dbt_valid_to is null\n\n ),\n\n insertions_source_data as (\n\n select\n *,\n {{ strategy.unique_key }} as dbt_unique_key,\n {{ strategy.updated_at }} as dbt_updated_at,\n {{ strategy.updated_at }} as dbt_valid_from,\n nullif({{ strategy.updated_at }}, {{ strategy.updated_at }}) as dbt_valid_to,\n {{ strategy.scd_id }} as dbt_scd_id\n\n from snapshot_query\n ),\n\n updates_source_data as (\n\n select\n *,\n {{ strategy.unique_key }} as dbt_unique_key,\n {{ strategy.updated_at }} as dbt_updated_at,\n {{ strategy.updated_at }} as dbt_valid_from,\n {{ strategy.updated_at }} as dbt_valid_to\n\n from snapshot_query\n ),\n\n {%- if strategy.invalidate_hard_deletes %}\n\n deletes_source_data as (\n\n select\n *,\n {{ strategy.unique_key }} as dbt_unique_key\n from snapshot_query\n ),\n {% endif %}\n\n insertions as (\n\n select\n 'insert' as dbt_change_type,\n source_data.*\n\n from insertions_source_data as source_data\n left outer join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key\n where snapshotted_data.dbt_unique_key is null\n or (\n snapshotted_data.dbt_unique_key is not null\n and (\n {{ strategy.row_changed }}\n )\n )\n\n ),\n\n updates as (\n\n select\n 'update' as dbt_change_type,\n source_data.*,\n snapshotted_data.dbt_scd_id\n\n from updates_source_data as source_data\n join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key\n where (\n {{ strategy.row_changed }}\n )\n )\n\n {%- if strategy.invalidate_hard_deletes -%}\n ,\n\n deletes as (\n\n select\n 'delete' as dbt_change_type,\n source_data.*,\n {{ snapshot_get_time() }} as dbt_valid_from,\n {{ snapshot_get_time() }} as dbt_updated_at,\n {{ snapshot_get_time() }} as dbt_valid_to,\n snapshotted_data.dbt_scd_id\n\n from snapshotted_data\n left join deletes_source_data as source_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key\n where source_data.dbt_unique_key is null\n )\n {%- endif %}\n\n select * from insertions\n union all\n select * from updates\n {%- if strategy.invalidate_hard_deletes %}\n union all\n select * from deletes\n {%- endif %}\n\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.snapshot_get_time"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.58063, "supported_languages": null}, "macro.dbt.build_snapshot_table": {"name": "build_snapshot_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.build_snapshot_table", "macro_sql": "{% macro build_snapshot_table(strategy, sql) -%}\n {{ adapter.dispatch('build_snapshot_table', 'dbt')(strategy, sql) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__build_snapshot_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.580791, "supported_languages": null}, "macro.dbt.default__build_snapshot_table": {"name": "default__build_snapshot_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.default__build_snapshot_table", "macro_sql": "{% macro default__build_snapshot_table(strategy, sql) %}\n\n select *,\n {{ strategy.scd_id }} as dbt_scd_id,\n {{ strategy.updated_at }} as dbt_updated_at,\n {{ strategy.updated_at }} as dbt_valid_from,\n nullif({{ strategy.updated_at }}, {{ strategy.updated_at }}) as dbt_valid_to\n from (\n {{ sql }}\n ) sbq\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.581003, "supported_languages": null}, "macro.dbt.build_snapshot_staging_table": {"name": "build_snapshot_staging_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/helpers.sql", "original_file_path": "macros/materializations/snapshots/helpers.sql", "unique_id": "macro.dbt.build_snapshot_staging_table", "macro_sql": "{% macro build_snapshot_staging_table(strategy, sql, target_relation) %}\n {% set temp_relation = make_temp_relation(target_relation) %}\n\n {% set select = snapshot_staging_table(strategy, sql, target_relation) %}\n\n {% call statement('build_snapshot_staging_relation') %}\n {{ create_table_as(True, temp_relation, select) }}\n {% endcall %}\n\n {% do return(temp_relation) %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.make_temp_relation", "macro.dbt.snapshot_staging_table", "macro.dbt.statement", "macro.dbt.create_table_as"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.581364, "supported_languages": null}, "macro.dbt.materialization_snapshot_default": {"name": "materialization_snapshot_default", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/snapshots/snapshot.sql", "original_file_path": "macros/materializations/snapshots/snapshot.sql", "unique_id": "macro.dbt.materialization_snapshot_default", "macro_sql": "{% materialization snapshot, default %}\n {%- set config = model['config'] -%}\n\n {%- set target_table = model.get('alias', model.get('name')) -%}\n\n {%- set strategy_name = config.get('strategy') -%}\n {%- set unique_key = config.get('unique_key') %}\n -- grab current tables grants config for comparision later on\n {%- set grant_config = config.get('grants') -%}\n\n {% set target_relation_exists, target_relation = get_or_create_relation(\n database=model.database,\n schema=model.schema,\n identifier=target_table,\n type='table') -%}\n\n {%- if not target_relation.is_table -%}\n {% do exceptions.relation_wrong_type(target_relation, 'table') %}\n {%- endif -%}\n\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n {% set strategy_macro = strategy_dispatch(strategy_name) %}\n {% set strategy = strategy_macro(model, \"snapshotted_data\", \"source_data\", config, target_relation_exists) %}\n\n {% if not target_relation_exists %}\n\n {% set build_sql = build_snapshot_table(strategy, model['compiled_code']) %}\n {% set final_sql = create_table_as(False, target_relation, build_sql) %}\n\n {% else %}\n\n {{ adapter.valid_snapshot_target(target_relation) }}\n\n {% set staging_table = build_snapshot_staging_table(strategy, sql, target_relation) %}\n\n -- this may no-op if the database does not require column expansion\n {% do adapter.expand_target_column_types(from_relation=staging_table,\n to_relation=target_relation) %}\n\n {% set missing_columns = adapter.get_missing_columns(staging_table, target_relation)\n | rejectattr('name', 'equalto', 'dbt_change_type')\n | rejectattr('name', 'equalto', 'DBT_CHANGE_TYPE')\n | rejectattr('name', 'equalto', 'dbt_unique_key')\n | rejectattr('name', 'equalto', 'DBT_UNIQUE_KEY')\n | list %}\n\n {% do create_columns(target_relation, missing_columns) %}\n\n {% set source_columns = adapter.get_columns_in_relation(staging_table)\n | rejectattr('name', 'equalto', 'dbt_change_type')\n | rejectattr('name', 'equalto', 'DBT_CHANGE_TYPE')\n | rejectattr('name', 'equalto', 'dbt_unique_key')\n | rejectattr('name', 'equalto', 'DBT_UNIQUE_KEY')\n | list %}\n\n {% set quoted_source_columns = [] %}\n {% for column in source_columns %}\n {% do quoted_source_columns.append(adapter.quote(column.name)) %}\n {% endfor %}\n\n {% set final_sql = snapshot_merge_sql(\n target = target_relation,\n source = staging_table,\n insert_cols = quoted_source_columns\n )\n %}\n\n {% endif %}\n\n {% call statement('main') %}\n {{ final_sql }}\n {% endcall %}\n\n {% set should_revoke = should_revoke(target_relation_exists, full_refresh_mode=False) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n {% if not target_relation_exists %}\n {% do create_indexes(target_relation) %}\n {% endif %}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n {{ adapter.commit() }}\n\n {% if staging_table is defined %}\n {% do post_snapshot(staging_table) %}\n {% endif %}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n\n{% endmaterialization %}", "depends_on": {"macros": ["macro.dbt.get_or_create_relation", "macro.dbt.run_hooks", "macro.dbt.strategy_dispatch", "macro.dbt.build_snapshot_table", "macro.dbt.create_table_as", "macro.dbt.build_snapshot_staging_table", "macro.dbt.create_columns", "macro.dbt.snapshot_merge_sql", "macro.dbt.statement", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs", "macro.dbt.create_indexes", "macro.dbt.post_snapshot"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5864599, "supported_languages": ["sql"]}, "macro.dbt.materialization_test_default": {"name": "materialization_test_default", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/tests/test.sql", "original_file_path": "macros/materializations/tests/test.sql", "unique_id": "macro.dbt.materialization_test_default", "macro_sql": "{%- materialization test, default -%}\n\n {% set relations = [] %}\n\n {% if should_store_failures() %}\n\n {% set identifier = model['alias'] %}\n {% set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) %}\n {% set target_relation = api.Relation.create(\n identifier=identifier, schema=schema, database=database, type='table') -%} %}\n\n {% if old_relation %}\n {% do adapter.drop_relation(old_relation) %}\n {% endif %}\n\n {% call statement(auto_begin=True) %}\n {{ create_table_as(False, target_relation, sql) }}\n {% endcall %}\n\n {% do relations.append(target_relation) %}\n\n {% set main_sql %}\n select *\n from {{ target_relation }}\n {% endset %}\n\n {{ adapter.commit() }}\n\n {% else %}\n\n {% set main_sql = sql %}\n\n {% endif %}\n\n {% set limit = config.get('limit') %}\n {% set fail_calc = config.get('fail_calc') %}\n {% set warn_if = config.get('warn_if') %}\n {% set error_if = config.get('error_if') %}\n\n {% call statement('main', fetch_result=True) -%}\n\n {{ get_test_sql(main_sql, fail_calc, warn_if, error_if, limit)}}\n\n {%- endcall %}\n\n {{ return({'relations': relations}) }}\n\n{%- endmaterialization -%}", "depends_on": {"macros": ["macro.dbt.should_store_failures", "macro.dbt.statement", "macro.dbt.create_table_as", "macro.dbt.get_test_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5882041, "supported_languages": ["sql"]}, "macro.dbt.get_test_sql": {"name": "get_test_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/tests/helpers.sql", "original_file_path": "macros/materializations/tests/helpers.sql", "unique_id": "macro.dbt.get_test_sql", "macro_sql": "{% macro get_test_sql(main_sql, fail_calc, warn_if, error_if, limit) -%}\n {{ adapter.dispatch('get_test_sql', 'dbt')(main_sql, fail_calc, warn_if, error_if, limit) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_test_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.588593, "supported_languages": null}, "macro.dbt.default__get_test_sql": {"name": "default__get_test_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/tests/helpers.sql", "original_file_path": "macros/materializations/tests/helpers.sql", "unique_id": "macro.dbt.default__get_test_sql", "macro_sql": "{% macro default__get_test_sql(main_sql, fail_calc, warn_if, error_if, limit) -%}\n select\n {{ fail_calc }} as failures,\n {{ fail_calc }} {{ warn_if }} as should_warn,\n {{ fail_calc }} {{ error_if }} as should_error\n from (\n {{ main_sql }}\n {{ \"limit \" ~ limit if limit != none }}\n ) dbt_internal_test\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.588854, "supported_languages": null}, "macro.dbt.get_where_subquery": {"name": "get_where_subquery", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/tests/where_subquery.sql", "original_file_path": "macros/materializations/tests/where_subquery.sql", "unique_id": "macro.dbt.get_where_subquery", "macro_sql": "{% macro get_where_subquery(relation) -%}\n {% do return(adapter.dispatch('get_where_subquery', 'dbt')(relation)) %}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_where_subquery"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.589171, "supported_languages": null}, "macro.dbt.default__get_where_subquery": {"name": "default__get_where_subquery", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/tests/where_subquery.sql", "original_file_path": "macros/materializations/tests/where_subquery.sql", "unique_id": "macro.dbt.default__get_where_subquery", "macro_sql": "{% macro default__get_where_subquery(relation) -%}\n {% set where = config.get('where', '') %}\n {% if where %}\n {%- set filtered -%}\n (select * from {{ relation }} where {{ where }}) dbt_subquery\n {%- endset -%}\n {% do return(filtered) %}\n {%- else -%}\n {% do return(relation) %}\n {%- endif -%}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.589488, "supported_languages": null}, "macro.dbt.get_quoted_csv": {"name": "get_quoted_csv", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/column_helpers.sql", "original_file_path": "macros/materializations/models/incremental/column_helpers.sql", "unique_id": "macro.dbt.get_quoted_csv", "macro_sql": "{% macro get_quoted_csv(column_names) %}\n\n {% set quoted = [] %}\n {% for col in column_names -%}\n {%- do quoted.append(adapter.quote(col)) -%}\n {%- endfor %}\n\n {%- set dest_cols_csv = quoted | join(', ') -%}\n {{ return(dest_cols_csv) }}\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5908349, "supported_languages": null}, "macro.dbt.diff_columns": {"name": "diff_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/column_helpers.sql", "original_file_path": "macros/materializations/models/incremental/column_helpers.sql", "unique_id": "macro.dbt.diff_columns", "macro_sql": "{% macro diff_columns(source_columns, target_columns) %}\n\n {% set result = [] %}\n {% set source_names = source_columns | map(attribute = 'column') | list %}\n {% set target_names = target_columns | map(attribute = 'column') | list %}\n\n {# --check whether the name attribute exists in the target - this does not perform a data type check #}\n {% for sc in source_columns %}\n {% if sc.name not in target_names %}\n {{ result.append(sc) }}\n {% endif %}\n {% endfor %}\n\n {{ return(result) }}\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.591302, "supported_languages": null}, "macro.dbt.diff_column_data_types": {"name": "diff_column_data_types", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/column_helpers.sql", "original_file_path": "macros/materializations/models/incremental/column_helpers.sql", "unique_id": "macro.dbt.diff_column_data_types", "macro_sql": "{% macro diff_column_data_types(source_columns, target_columns) %}\n\n {% set result = [] %}\n {% for sc in source_columns %}\n {% set tc = target_columns | selectattr(\"name\", \"equalto\", sc.name) | list | first %}\n {% if tc %}\n {% if sc.data_type != tc.data_type and not sc.can_expand_to(other_column=tc) %}\n {{ result.append( { 'column_name': tc.name, 'new_type': sc.data_type } ) }}\n {% endif %}\n {% endif %}\n {% endfor %}\n\n {{ return(result) }}\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5918689, "supported_languages": null}, "macro.dbt.get_merge_update_columns": {"name": "get_merge_update_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/column_helpers.sql", "original_file_path": "macros/materializations/models/incremental/column_helpers.sql", "unique_id": "macro.dbt.get_merge_update_columns", "macro_sql": "{% macro get_merge_update_columns(merge_update_columns, merge_exclude_columns, dest_columns) %}\n {{ return(adapter.dispatch('get_merge_update_columns', 'dbt')(merge_update_columns, merge_exclude_columns, dest_columns)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_merge_update_columns"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.5920708, "supported_languages": null}, "macro.dbt.default__get_merge_update_columns": {"name": "default__get_merge_update_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/column_helpers.sql", "original_file_path": "macros/materializations/models/incremental/column_helpers.sql", "unique_id": "macro.dbt.default__get_merge_update_columns", "macro_sql": "{% macro default__get_merge_update_columns(merge_update_columns, merge_exclude_columns, dest_columns) %}\n {%- set default_cols = dest_columns | map(attribute=\"quoted\") | list -%}\n\n {%- if merge_update_columns and merge_exclude_columns -%}\n {{ exceptions.raise_compiler_error(\n 'Model cannot specify merge_update_columns and merge_exclude_columns. Please update model to use only one config'\n )}}\n {%- elif merge_update_columns -%}\n {%- set update_columns = merge_update_columns -%}\n {%- elif merge_exclude_columns -%}\n {%- set update_columns = [] -%}\n {%- for column in dest_columns -%}\n {% if column.column | lower not in merge_exclude_columns | map(\"lower\") | list %}\n {%- do update_columns.append(column.quoted) -%}\n {% endif %}\n {%- endfor -%}\n {%- else -%}\n {%- set update_columns = default_cols -%}\n {%- endif -%}\n\n {{ return(update_columns) }}\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.592672, "supported_languages": null}, "macro.dbt.get_merge_sql": {"name": "get_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/merge.sql", "original_file_path": "macros/materializations/models/incremental/merge.sql", "unique_id": "macro.dbt.get_merge_sql", "macro_sql": "{% macro get_merge_sql(target, source, unique_key, dest_columns, incremental_predicates=none) -%}\n -- back compat for old kwarg name\n {% set incremental_predicates = kwargs.get('predicates', incremental_predicates) %}\n {{ adapter.dispatch('get_merge_sql', 'dbt')(target, source, unique_key, dest_columns, incremental_predicates) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.597745, "supported_languages": null}, "macro.dbt.default__get_merge_sql": {"name": "default__get_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/merge.sql", "original_file_path": "macros/materializations/models/incremental/merge.sql", "unique_id": "macro.dbt.default__get_merge_sql", "macro_sql": "{% macro default__get_merge_sql(target, source, unique_key, dest_columns, incremental_predicates=none) -%}\n {%- set predicates = [] if incremental_predicates is none else [] + incremental_predicates -%}\n {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute=\"name\")) -%}\n {%- set merge_update_columns = config.get('merge_update_columns') -%}\n {%- set merge_exclude_columns = config.get('merge_exclude_columns') -%}\n {%- set update_columns = get_merge_update_columns(merge_update_columns, merge_exclude_columns, dest_columns) -%}\n {%- set sql_header = config.get('sql_header', none) -%}\n\n {% if unique_key %}\n {% if unique_key is sequence and unique_key is not mapping and unique_key is not string %}\n {% for key in unique_key %}\n {% set this_key_match %}\n DBT_INTERNAL_SOURCE.{{ key }} = DBT_INTERNAL_DEST.{{ key }}\n {% endset %}\n {% do predicates.append(this_key_match) %}\n {% endfor %}\n {% else %}\n {% set unique_key_match %}\n DBT_INTERNAL_SOURCE.{{ unique_key }} = DBT_INTERNAL_DEST.{{ unique_key }}\n {% endset %}\n {% do predicates.append(unique_key_match) %}\n {% endif %}\n {% else %}\n {% do predicates.append('FALSE') %}\n {% endif %}\n\n {{ sql_header if sql_header is not none }}\n\n merge into {{ target }} as DBT_INTERNAL_DEST\n using {{ source }} as DBT_INTERNAL_SOURCE\n on {{\"(\" ~ predicates | join(\") and (\") ~ \")\"}}\n\n {% if unique_key %}\n when matched then update set\n {% for column_name in update_columns -%}\n {{ column_name }} = DBT_INTERNAL_SOURCE.{{ column_name }}\n {%- if not loop.last %}, {%- endif %}\n {%- endfor %}\n {% endif %}\n\n when not matched then insert\n ({{ dest_cols_csv }})\n values\n ({{ dest_cols_csv }})\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_quoted_csv", "macro.dbt.get_merge_update_columns"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.599158, "supported_languages": null}, "macro.dbt.get_delete_insert_merge_sql": {"name": "get_delete_insert_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/merge.sql", "original_file_path": "macros/materializations/models/incremental/merge.sql", "unique_id": "macro.dbt.get_delete_insert_merge_sql", "macro_sql": "{% macro get_delete_insert_merge_sql(target, source, unique_key, dest_columns, incremental_predicates) -%}\n {{ adapter.dispatch('get_delete_insert_merge_sql', 'dbt')(target, source, unique_key, dest_columns, incremental_predicates) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_delete_insert_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.599381, "supported_languages": null}, "macro.dbt.default__get_delete_insert_merge_sql": {"name": "default__get_delete_insert_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/merge.sql", "original_file_path": "macros/materializations/models/incremental/merge.sql", "unique_id": "macro.dbt.default__get_delete_insert_merge_sql", "macro_sql": "{% macro default__get_delete_insert_merge_sql(target, source, unique_key, dest_columns, incremental_predicates) -%}\n\n {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute=\"name\")) -%}\n\n {% if unique_key %}\n {% if unique_key is sequence and unique_key is not string %}\n delete from {{target }}\n using {{ source }}\n where (\n {% for key in unique_key %}\n {{ source }}.{{ key }} = {{ target }}.{{ key }}\n {{ \"and \" if not loop.last}}\n {% endfor %}\n {% if incremental_predicates %}\n {% for predicate in incremental_predicates %}\n and {{ predicate }}\n {% endfor %}\n {% endif %}\n );\n {% else %}\n delete from {{ target }}\n where (\n {{ unique_key }}) in (\n select ({{ unique_key }})\n from {{ source }}\n )\n {%- if incremental_predicates %}\n {% for predicate in incremental_predicates %}\n and {{ predicate }}\n {% endfor %}\n {%- endif -%};\n\n {% endif %}\n {% endif %}\n\n insert into {{ target }} ({{ dest_cols_csv }})\n (\n select {{ dest_cols_csv }}\n from {{ source }}\n )\n\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.get_quoted_csv"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.600228, "supported_languages": null}, "macro.dbt.get_insert_overwrite_merge_sql": {"name": "get_insert_overwrite_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/merge.sql", "original_file_path": "macros/materializations/models/incremental/merge.sql", "unique_id": "macro.dbt.get_insert_overwrite_merge_sql", "macro_sql": "{% macro get_insert_overwrite_merge_sql(target, source, dest_columns, predicates, include_sql_header=false) -%}\n {{ adapter.dispatch('get_insert_overwrite_merge_sql', 'dbt')(target, source, dest_columns, predicates, include_sql_header) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_insert_overwrite_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.600458, "supported_languages": null}, "macro.dbt.default__get_insert_overwrite_merge_sql": {"name": "default__get_insert_overwrite_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/merge.sql", "original_file_path": "macros/materializations/models/incremental/merge.sql", "unique_id": "macro.dbt.default__get_insert_overwrite_merge_sql", "macro_sql": "{% macro default__get_insert_overwrite_merge_sql(target, source, dest_columns, predicates, include_sql_header) -%}\n {#-- The only time include_sql_header is True: --#}\n {#-- BigQuery + insert_overwrite strategy + \"static\" partitions config --#}\n {#-- We should consider including the sql header at the materialization level instead --#}\n\n {%- set predicates = [] if predicates is none else [] + predicates -%}\n {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute=\"name\")) -%}\n {%- set sql_header = config.get('sql_header', none) -%}\n\n {{ sql_header if sql_header is not none and include_sql_header }}\n\n merge into {{ target }} as DBT_INTERNAL_DEST\n using {{ source }} as DBT_INTERNAL_SOURCE\n on FALSE\n\n when not matched by source\n {% if predicates %} and {{ predicates | join(' and ') }} {% endif %}\n then delete\n\n when not matched then insert\n ({{ dest_cols_csv }})\n values\n ({{ dest_cols_csv }})\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_quoted_csv"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6010041, "supported_languages": null}, "macro.dbt.is_incremental": {"name": "is_incremental", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/is_incremental.sql", "original_file_path": "macros/materializations/models/incremental/is_incremental.sql", "unique_id": "macro.dbt.is_incremental", "macro_sql": "{% macro is_incremental() %}\n {#-- do not run introspective queries in parsing #}\n {% if not execute %}\n {{ return(False) }}\n {% else %}\n {% set relation = adapter.get_relation(this.database, this.schema, this.table) %}\n {{ return(relation is not none\n and relation.type == 'table'\n and model.config.materialized == 'incremental'\n and not should_full_refresh()) }}\n {% endif %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.should_full_refresh"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.60154, "supported_languages": null}, "macro.dbt.get_incremental_append_sql": {"name": "get_incremental_append_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.get_incremental_append_sql", "macro_sql": "{% macro get_incremental_append_sql(arg_dict) %}\n\n {{ return(adapter.dispatch('get_incremental_append_sql', 'dbt')(arg_dict)) }}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__get_incremental_append_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.602304, "supported_languages": null}, "macro.dbt.default__get_incremental_append_sql": {"name": "default__get_incremental_append_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.default__get_incremental_append_sql", "macro_sql": "{% macro default__get_incremental_append_sql(arg_dict) %}\n\n {% do return(get_insert_into_sql(arg_dict[\"target_relation\"], arg_dict[\"temp_relation\"], arg_dict[\"dest_columns\"])) %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_insert_into_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.602527, "supported_languages": null}, "macro.dbt.get_incremental_delete_insert_sql": {"name": "get_incremental_delete_insert_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.get_incremental_delete_insert_sql", "macro_sql": "{% macro get_incremental_delete_insert_sql(arg_dict) %}\n\n {{ return(adapter.dispatch('get_incremental_delete_insert_sql', 'dbt')(arg_dict)) }}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__get_incremental_delete_insert_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6026812, "supported_languages": null}, "macro.dbt.default__get_incremental_delete_insert_sql": {"name": "default__get_incremental_delete_insert_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.default__get_incremental_delete_insert_sql", "macro_sql": "{% macro default__get_incremental_delete_insert_sql(arg_dict) %}\n\n {% do return(get_delete_insert_merge_sql(arg_dict[\"target_relation\"], arg_dict[\"temp_relation\"], arg_dict[\"unique_key\"], arg_dict[\"dest_columns\"], arg_dict[\"incremental_predicates\"])) %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_delete_insert_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.602931, "supported_languages": null}, "macro.dbt.get_incremental_merge_sql": {"name": "get_incremental_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.get_incremental_merge_sql", "macro_sql": "{% macro get_incremental_merge_sql(arg_dict) %}\n\n {{ return(adapter.dispatch('get_incremental_merge_sql', 'dbt')(arg_dict)) }}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_incremental_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6030898, "supported_languages": null}, "macro.dbt.default__get_incremental_merge_sql": {"name": "default__get_incremental_merge_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.default__get_incremental_merge_sql", "macro_sql": "{% macro default__get_incremental_merge_sql(arg_dict) %}\n\n {% do return(get_merge_sql(arg_dict[\"target_relation\"], arg_dict[\"temp_relation\"], arg_dict[\"unique_key\"], arg_dict[\"dest_columns\"], arg_dict[\"incremental_predicates\"])) %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6033401, "supported_languages": null}, "macro.dbt.get_incremental_insert_overwrite_sql": {"name": "get_incremental_insert_overwrite_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.get_incremental_insert_overwrite_sql", "macro_sql": "{% macro get_incremental_insert_overwrite_sql(arg_dict) %}\n\n {{ return(adapter.dispatch('get_incremental_insert_overwrite_sql', 'dbt')(arg_dict)) }}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_incremental_insert_overwrite_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.603498, "supported_languages": null}, "macro.dbt.default__get_incremental_insert_overwrite_sql": {"name": "default__get_incremental_insert_overwrite_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.default__get_incremental_insert_overwrite_sql", "macro_sql": "{% macro default__get_incremental_insert_overwrite_sql(arg_dict) %}\n\n {% do return(get_insert_overwrite_merge_sql(arg_dict[\"target_relation\"], arg_dict[\"temp_relation\"], arg_dict[\"dest_columns\"], arg_dict[\"incremental_predicates\"])) %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_insert_overwrite_merge_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.603723, "supported_languages": null}, "macro.dbt.get_incremental_default_sql": {"name": "get_incremental_default_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.get_incremental_default_sql", "macro_sql": "{% macro get_incremental_default_sql(arg_dict) %}\n\n {{ return(adapter.dispatch('get_incremental_default_sql', 'dbt')(arg_dict)) }}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__get_incremental_default_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6038861, "supported_languages": null}, "macro.dbt.default__get_incremental_default_sql": {"name": "default__get_incremental_default_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.default__get_incremental_default_sql", "macro_sql": "{% macro default__get_incremental_default_sql(arg_dict) %}\n\n {% do return(get_incremental_append_sql(arg_dict)) %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_incremental_append_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.604013, "supported_languages": null}, "macro.dbt.get_insert_into_sql": {"name": "get_insert_into_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/strategies.sql", "original_file_path": "macros/materializations/models/incremental/strategies.sql", "unique_id": "macro.dbt.get_insert_into_sql", "macro_sql": "{% macro get_insert_into_sql(target_relation, temp_relation, dest_columns) %}\n\n {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute=\"name\")) -%}\n\n insert into {{ target_relation }} ({{ dest_cols_csv }})\n (\n select {{ dest_cols_csv }}\n from {{ temp_relation }}\n )\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_quoted_csv"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.604246, "supported_languages": null}, "macro.dbt.materialization_incremental_default": {"name": "materialization_incremental_default", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/incremental.sql", "original_file_path": "macros/materializations/models/incremental/incremental.sql", "unique_id": "macro.dbt.materialization_incremental_default", "macro_sql": "{% materialization incremental, default -%}\n\n -- relations\n {%- set existing_relation = load_cached_relation(this) -%}\n {%- set target_relation = this.incorporate(type='table') -%}\n {%- set temp_relation = make_temp_relation(target_relation)-%}\n {%- set intermediate_relation = make_intermediate_relation(target_relation)-%}\n {%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}\n {%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}\n\n -- configs\n {%- set unique_key = config.get('unique_key') -%}\n {%- set full_refresh_mode = (should_full_refresh() or existing_relation.is_view) -%}\n {%- set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') -%}\n\n -- the temp_ and backup_ relations should not already exist in the database; get_relation\n -- will return None in that case. Otherwise, we get a relation that we can drop\n -- later, before we try to use this name for the current operation. This has to happen before\n -- BEGIN, in a separate transaction\n {%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation)-%}\n {%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}\n -- grab current tables grants config for comparision later on\n {% set grant_config = config.get('grants') %}\n {{ drop_relation_if_exists(preexisting_intermediate_relation) }}\n {{ drop_relation_if_exists(preexisting_backup_relation) }}\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n -- `BEGIN` happens here:\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n {% set to_drop = [] %}\n\n {% if existing_relation is none %}\n {% set build_sql = get_create_table_as_sql(False, target_relation, sql) %}\n {% elif full_refresh_mode %}\n {% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}\n {% set need_swap = true %}\n {% else %}\n {% do run_query(get_create_table_as_sql(True, temp_relation, sql)) %}\n {% do adapter.expand_target_column_types(\n from_relation=temp_relation,\n to_relation=target_relation) %}\n {#-- Process schema changes. Returns dict of changes if successful. Use source columns for upserting/merging --#}\n {% set dest_columns = process_schema_changes(on_schema_change, temp_relation, existing_relation) %}\n {% if not dest_columns %}\n {% set dest_columns = adapter.get_columns_in_relation(existing_relation) %}\n {% endif %}\n\n {#-- Get the incremental_strategy, the macro to use for the strategy, and build the sql --#}\n {% set incremental_strategy = config.get('incremental_strategy') or 'default' %}\n {% set incremental_predicates = config.get('predicates', none) or config.get('incremental_predicates', none) %}\n {% set strategy_sql_macro_func = adapter.get_incremental_strategy_macro(context, incremental_strategy) %}\n {% set strategy_arg_dict = ({'target_relation': target_relation, 'temp_relation': temp_relation, 'unique_key': unique_key, 'dest_columns': dest_columns, 'incremental_predicates': incremental_predicates }) %}\n {% set build_sql = strategy_sql_macro_func(strategy_arg_dict) %}\n\n {% endif %}\n\n {% call statement(\"main\") %}\n {{ build_sql }}\n {% endcall %}\n\n {% if need_swap %}\n {% do adapter.rename_relation(target_relation, backup_relation) %}\n {% do adapter.rename_relation(intermediate_relation, target_relation) %}\n {% do to_drop.append(backup_relation) %}\n {% endif %}\n\n {% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n {% if existing_relation is none or existing_relation.is_view or should_full_refresh() %}\n {% do create_indexes(target_relation) %}\n {% endif %}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n -- `COMMIT` happens here\n {% do adapter.commit() %}\n\n {% for rel in to_drop %}\n {% do adapter.drop_relation(rel) %}\n {% endfor %}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n\n{%- endmaterialization %}", "depends_on": {"macros": ["macro.dbt.load_cached_relation", "macro.dbt.make_temp_relation", "macro.dbt.make_intermediate_relation", "macro.dbt.make_backup_relation", "macro.dbt.should_full_refresh", "macro.dbt.incremental_validate_on_schema_change", "macro.dbt.drop_relation_if_exists", "macro.dbt.run_hooks", "macro.dbt.get_create_table_as_sql", "macro.dbt.run_query", "macro.dbt.process_schema_changes", "macro.dbt.statement", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs", "macro.dbt.create_indexes"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6085372, "supported_languages": ["sql"]}, "macro.dbt.incremental_validate_on_schema_change": {"name": "incremental_validate_on_schema_change", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/on_schema_change.sql", "original_file_path": "macros/materializations/models/incremental/on_schema_change.sql", "unique_id": "macro.dbt.incremental_validate_on_schema_change", "macro_sql": "{% macro incremental_validate_on_schema_change(on_schema_change, default='ignore') %}\n\n {% if on_schema_change not in ['sync_all_columns', 'append_new_columns', 'fail', 'ignore'] %}\n\n {% set log_message = 'Invalid value for on_schema_change (%s) specified. Setting default value of %s.' % (on_schema_change, default) %}\n {% do log(log_message) %}\n\n {{ return(default) }}\n\n {% else %}\n\n {{ return(on_schema_change) }}\n\n {% endif %}\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.613145, "supported_languages": null}, "macro.dbt.check_for_schema_changes": {"name": "check_for_schema_changes", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/on_schema_change.sql", "original_file_path": "macros/materializations/models/incremental/on_schema_change.sql", "unique_id": "macro.dbt.check_for_schema_changes", "macro_sql": "{% macro check_for_schema_changes(source_relation, target_relation) %}\n\n {% set schema_changed = False %}\n\n {%- set source_columns = adapter.get_columns_in_relation(source_relation) -%}\n {%- set target_columns = adapter.get_columns_in_relation(target_relation) -%}\n {%- set source_not_in_target = diff_columns(source_columns, target_columns) -%}\n {%- set target_not_in_source = diff_columns(target_columns, source_columns) -%}\n\n {% set new_target_types = diff_column_data_types(source_columns, target_columns) %}\n\n {% if source_not_in_target != [] %}\n {% set schema_changed = True %}\n {% elif target_not_in_source != [] or new_target_types != [] %}\n {% set schema_changed = True %}\n {% elif new_target_types != [] %}\n {% set schema_changed = True %}\n {% endif %}\n\n {% set changes_dict = {\n 'schema_changed': schema_changed,\n 'source_not_in_target': source_not_in_target,\n 'target_not_in_source': target_not_in_source,\n 'source_columns': source_columns,\n 'target_columns': target_columns,\n 'new_target_types': new_target_types\n } %}\n\n {% set msg %}\n In {{ target_relation }}:\n Schema changed: {{ schema_changed }}\n Source columns not in target: {{ source_not_in_target }}\n Target columns not in source: {{ target_not_in_source }}\n New column types: {{ new_target_types }}\n {% endset %}\n\n {% do log(msg) %}\n\n {{ return(changes_dict) }}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.diff_columns", "macro.dbt.diff_column_data_types"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.614183, "supported_languages": null}, "macro.dbt.sync_column_schemas": {"name": "sync_column_schemas", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/on_schema_change.sql", "original_file_path": "macros/materializations/models/incremental/on_schema_change.sql", "unique_id": "macro.dbt.sync_column_schemas", "macro_sql": "{% macro sync_column_schemas(on_schema_change, target_relation, schema_changes_dict) %}\n\n {%- set add_to_target_arr = schema_changes_dict['source_not_in_target'] -%}\n\n {%- if on_schema_change == 'append_new_columns'-%}\n {%- if add_to_target_arr | length > 0 -%}\n {%- do alter_relation_add_remove_columns(target_relation, add_to_target_arr, none) -%}\n {%- endif -%}\n\n {% elif on_schema_change == 'sync_all_columns' %}\n {%- set remove_from_target_arr = schema_changes_dict['target_not_in_source'] -%}\n {%- set new_target_types = schema_changes_dict['new_target_types'] -%}\n\n {% if add_to_target_arr | length > 0 or remove_from_target_arr | length > 0 %}\n {%- do alter_relation_add_remove_columns(target_relation, add_to_target_arr, remove_from_target_arr) -%}\n {% endif %}\n\n {% if new_target_types != [] %}\n {% for ntt in new_target_types %}\n {% set column_name = ntt['column_name'] %}\n {% set new_type = ntt['new_type'] %}\n {% do alter_column_type(target_relation, column_name, new_type) %}\n {% endfor %}\n {% endif %}\n\n {% endif %}\n\n {% set schema_change_message %}\n In {{ target_relation }}:\n Schema change approach: {{ on_schema_change }}\n Columns added: {{ add_to_target_arr }}\n Columns removed: {{ remove_from_target_arr }}\n Data types changed: {{ new_target_types }}\n {% endset %}\n\n {% do log(schema_change_message) %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.alter_relation_add_remove_columns", "macro.dbt.alter_column_type"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.615208, "supported_languages": null}, "macro.dbt.process_schema_changes": {"name": "process_schema_changes", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/incremental/on_schema_change.sql", "original_file_path": "macros/materializations/models/incremental/on_schema_change.sql", "unique_id": "macro.dbt.process_schema_changes", "macro_sql": "{% macro process_schema_changes(on_schema_change, source_relation, target_relation) %}\n\n {% if on_schema_change == 'ignore' %}\n\n {{ return({}) }}\n\n {% else %}\n\n {% set schema_changes_dict = check_for_schema_changes(source_relation, target_relation) %}\n\n {% if schema_changes_dict['schema_changed'] %}\n\n {% if on_schema_change == 'fail' %}\n\n {% set fail_msg %}\n The source and target schemas on this incremental model are out of sync!\n They can be reconciled in several ways:\n - set the `on_schema_change` config to either append_new_columns or sync_all_columns, depending on your situation.\n - Re-run the incremental model with `full_refresh: True` to update the target schema.\n - update the schema manually and re-run the process.\n\n Additional troubleshooting context:\n Source columns not in target: {{ schema_changes_dict['source_not_in_target'] }}\n Target columns not in source: {{ schema_changes_dict['target_not_in_source'] }}\n New column types: {{ schema_changes_dict['new_target_types'] }}\n {% endset %}\n\n {% do exceptions.raise_compiler_error(fail_msg) %}\n\n {# -- unless we ignore, run the sync operation per the config #}\n {% else %}\n\n {% do sync_column_schemas(on_schema_change, target_relation, schema_changes_dict) %}\n\n {% endif %}\n\n {% endif %}\n\n {{ return(schema_changes_dict['source_columns']) }}\n\n {% endif %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.check_for_schema_changes", "macro.dbt.sync_column_schemas"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.615935, "supported_languages": null}, "macro.dbt.materialization_table_default": {"name": "materialization_table_default", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/table/table.sql", "original_file_path": "macros/materializations/models/table/table.sql", "unique_id": "macro.dbt.materialization_table_default", "macro_sql": "{% materialization table, default %}\n\n {%- set existing_relation = load_cached_relation(this) -%}\n {%- set target_relation = this.incorporate(type='table') %}\n {%- set intermediate_relation = make_intermediate_relation(target_relation) -%}\n -- the intermediate_relation should not already exist in the database; get_relation\n -- will return None in that case. Otherwise, we get a relation that we can drop\n -- later, before we try to use this name for the current operation\n {%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation) -%}\n /*\n See ../view/view.sql for more information about this relation.\n */\n {%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}\n {%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}\n -- as above, the backup_relation should not already exist\n {%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}\n -- grab current tables grants config for comparision later on\n {% set grant_config = config.get('grants') %}\n\n -- drop the temp relations if they exist already in the database\n {{ drop_relation_if_exists(preexisting_intermediate_relation) }}\n {{ drop_relation_if_exists(preexisting_backup_relation) }}\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n -- `BEGIN` happens here:\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n -- build model\n {% call statement('main') -%}\n {{ get_create_table_as_sql(False, intermediate_relation, sql) }}\n {%- endcall %}\n\n -- cleanup\n {% if existing_relation is not none %}\n {{ adapter.rename_relation(existing_relation, backup_relation) }}\n {% endif %}\n\n {{ adapter.rename_relation(intermediate_relation, target_relation) }}\n\n {% do create_indexes(target_relation) %}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n {% set should_revoke = should_revoke(existing_relation, full_refresh_mode=True) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n -- `COMMIT` happens here\n {{ adapter.commit() }}\n\n -- finally, drop the existing/backup relation after the commit\n {{ drop_relation_if_exists(backup_relation) }}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n{% endmaterialization %}", "depends_on": {"macros": ["macro.dbt.load_cached_relation", "macro.dbt.make_intermediate_relation", "macro.dbt.make_backup_relation", "macro.dbt.drop_relation_if_exists", "macro.dbt.run_hooks", "macro.dbt.statement", "macro.dbt.get_create_table_as_sql", "macro.dbt.create_indexes", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6181371, "supported_languages": ["sql"]}, "macro.dbt.get_create_table_as_sql": {"name": "get_create_table_as_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/table/create_table_as.sql", "original_file_path": "macros/materializations/models/table/create_table_as.sql", "unique_id": "macro.dbt.get_create_table_as_sql", "macro_sql": "{% macro get_create_table_as_sql(temporary, relation, sql) -%}\n {{ adapter.dispatch('get_create_table_as_sql', 'dbt')(temporary, relation, sql) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_create_table_as_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.618577, "supported_languages": null}, "macro.dbt.default__get_create_table_as_sql": {"name": "default__get_create_table_as_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/table/create_table_as.sql", "original_file_path": "macros/materializations/models/table/create_table_as.sql", "unique_id": "macro.dbt.default__get_create_table_as_sql", "macro_sql": "{% macro default__get_create_table_as_sql(temporary, relation, sql) -%}\n {{ return(create_table_as(temporary, relation, sql)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.create_table_as"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.618732, "supported_languages": null}, "macro.dbt.create_table_as": {"name": "create_table_as", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/table/create_table_as.sql", "original_file_path": "macros/materializations/models/table/create_table_as.sql", "unique_id": "macro.dbt.create_table_as", "macro_sql": "{% macro create_table_as(temporary, relation, compiled_code, language='sql') -%}\n {# backward compatibility for create_table_as that does not support language #}\n {% if language == \"sql\" %}\n {{ adapter.dispatch('create_table_as', 'dbt')(temporary, relation, compiled_code)}}\n {% else %}\n {{ adapter.dispatch('create_table_as', 'dbt')(temporary, relation, compiled_code, language) }}\n {% endif %}\n\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__create_table_as"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.619096, "supported_languages": null}, "macro.dbt.default__create_table_as": {"name": "default__create_table_as", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/table/create_table_as.sql", "original_file_path": "macros/materializations/models/table/create_table_as.sql", "unique_id": "macro.dbt.default__create_table_as", "macro_sql": "{% macro default__create_table_as(temporary, relation, sql) -%}\n {%- set sql_header = config.get('sql_header', none) -%}\n\n {{ sql_header if sql_header is not none }}\n\n create {% if temporary: -%}temporary{%- endif %} table\n {{ relation.include(database=(not temporary), schema=(not temporary)) }}\n as (\n {{ sql }}\n );\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.61945, "supported_languages": null}, "macro.dbt.materialization_view_default": {"name": "materialization_view_default", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/view.sql", "original_file_path": "macros/materializations/models/view/view.sql", "unique_id": "macro.dbt.materialization_view_default", "macro_sql": "{%- materialization view, default -%}\n\n {%- set existing_relation = load_cached_relation(this) -%}\n {%- set target_relation = this.incorporate(type='view') -%}\n {%- set intermediate_relation = make_intermediate_relation(target_relation) -%}\n\n -- the intermediate_relation should not already exist in the database; get_relation\n -- will return None in that case. Otherwise, we get a relation that we can drop\n -- later, before we try to use this name for the current operation\n {%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation) -%}\n /*\n This relation (probably) doesn't exist yet. If it does exist, it's a leftover from\n a previous run, and we're going to try to drop it immediately. At the end of this\n materialization, we're going to rename the \"existing_relation\" to this identifier,\n and then we're going to drop it. In order to make sure we run the correct one of:\n - drop view ...\n - drop table ...\n\n We need to set the type of this relation to be the type of the existing_relation, if it exists,\n or else \"view\" as a sane default if it does not. Note that if the existing_relation does not\n exist, then there is nothing to move out of the way and subsequentally drop. In that case,\n this relation will be effectively unused.\n */\n {%- set backup_relation_type = 'view' if existing_relation is none else existing_relation.type -%}\n {%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}\n -- as above, the backup_relation should not already exist\n {%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}\n -- grab current tables grants config for comparision later on\n {% set grant_config = config.get('grants') %}\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n -- drop the temp relations if they exist already in the database\n {{ drop_relation_if_exists(preexisting_intermediate_relation) }}\n {{ drop_relation_if_exists(preexisting_backup_relation) }}\n\n -- `BEGIN` happens here:\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n -- build model\n {% call statement('main') -%}\n {{ get_create_view_as_sql(intermediate_relation, sql) }}\n {%- endcall %}\n\n -- cleanup\n -- move the existing view out of the way\n {% if existing_relation is not none %}\n {{ adapter.rename_relation(existing_relation, backup_relation) }}\n {% endif %}\n {{ adapter.rename_relation(intermediate_relation, target_relation) }}\n\n {% set should_revoke = should_revoke(existing_relation, full_refresh_mode=True) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n {{ adapter.commit() }}\n\n {{ drop_relation_if_exists(backup_relation) }}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n\n{%- endmaterialization -%}", "depends_on": {"macros": ["macro.dbt.load_cached_relation", "macro.dbt.make_intermediate_relation", "macro.dbt.make_backup_relation", "macro.dbt.run_hooks", "macro.dbt.drop_relation_if_exists", "macro.dbt.statement", "macro.dbt.get_create_view_as_sql", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.62165, "supported_languages": ["sql"]}, "macro.dbt.handle_existing_table": {"name": "handle_existing_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/helpers.sql", "original_file_path": "macros/materializations/models/view/helpers.sql", "unique_id": "macro.dbt.handle_existing_table", "macro_sql": "{% macro handle_existing_table(full_refresh, old_relation) %}\n {{ adapter.dispatch('handle_existing_table', 'dbt')(full_refresh, old_relation) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__handle_existing_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.621908, "supported_languages": null}, "macro.dbt.default__handle_existing_table": {"name": "default__handle_existing_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/helpers.sql", "original_file_path": "macros/materializations/models/view/helpers.sql", "unique_id": "macro.dbt.default__handle_existing_table", "macro_sql": "{% macro default__handle_existing_table(full_refresh, old_relation) %}\n {{ log(\"Dropping relation \" ~ old_relation ~ \" because it is of type \" ~ old_relation.type) }}\n {{ adapter.drop_relation(old_relation) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6220968, "supported_languages": null}, "macro.dbt.create_or_replace_view": {"name": "create_or_replace_view", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/create_or_replace_view.sql", "original_file_path": "macros/materializations/models/view/create_or_replace_view.sql", "unique_id": "macro.dbt.create_or_replace_view", "macro_sql": "{% macro create_or_replace_view() %}\n {%- set identifier = model['alias'] -%}\n\n {%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}\n {%- set exists_as_view = (old_relation is not none and old_relation.is_view) -%}\n\n {%- set target_relation = api.Relation.create(\n identifier=identifier, schema=schema, database=database,\n type='view') -%}\n {% set grant_config = config.get('grants') %}\n\n {{ run_hooks(pre_hooks) }}\n\n -- If there's a table with the same name and we weren't told to full refresh,\n -- that's an error. If we were told to full refresh, drop it. This behavior differs\n -- for Snowflake and BigQuery, so multiple dispatch is used.\n {%- if old_relation is not none and old_relation.is_table -%}\n {{ handle_existing_table(should_full_refresh(), old_relation) }}\n {%- endif -%}\n\n -- build model\n {% call statement('main') -%}\n {{ get_create_view_as_sql(target_relation, sql) }}\n {%- endcall %}\n\n {% set should_revoke = should_revoke(exists_as_view, full_refresh_mode=True) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=True) %}\n\n {{ run_hooks(post_hooks) }}\n\n {{ return({'relations': [target_relation]}) }}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.run_hooks", "macro.dbt.handle_existing_table", "macro.dbt.should_full_refresh", "macro.dbt.statement", "macro.dbt.get_create_view_as_sql", "macro.dbt.should_revoke", "macro.dbt.apply_grants"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.623378, "supported_languages": null}, "macro.dbt.get_create_view_as_sql": {"name": "get_create_view_as_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/create_view_as.sql", "original_file_path": "macros/materializations/models/view/create_view_as.sql", "unique_id": "macro.dbt.get_create_view_as_sql", "macro_sql": "{% macro get_create_view_as_sql(relation, sql) -%}\n {{ adapter.dispatch('get_create_view_as_sql', 'dbt')(relation, sql) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_create_view_as_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.623689, "supported_languages": null}, "macro.dbt.default__get_create_view_as_sql": {"name": "default__get_create_view_as_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/create_view_as.sql", "original_file_path": "macros/materializations/models/view/create_view_as.sql", "unique_id": "macro.dbt.default__get_create_view_as_sql", "macro_sql": "{% macro default__get_create_view_as_sql(relation, sql) -%}\n {{ return(create_view_as(relation, sql)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.create_view_as"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.623824, "supported_languages": null}, "macro.dbt.create_view_as": {"name": "create_view_as", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/create_view_as.sql", "original_file_path": "macros/materializations/models/view/create_view_as.sql", "unique_id": "macro.dbt.create_view_as", "macro_sql": "{% macro create_view_as(relation, sql) -%}\n {{ adapter.dispatch('create_view_as', 'dbt')(relation, sql) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__create_view_as"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.623976, "supported_languages": null}, "macro.dbt.default__create_view_as": {"name": "default__create_view_as", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/models/view/create_view_as.sql", "original_file_path": "macros/materializations/models/view/create_view_as.sql", "unique_id": "macro.dbt.default__create_view_as", "macro_sql": "{% macro default__create_view_as(relation, sql) -%}\n {%- set sql_header = config.get('sql_header', none) -%}\n\n {{ sql_header if sql_header is not none }}\n create view {{ relation }} as (\n {{ sql }}\n );\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.624196, "supported_languages": null}, "macro.dbt.materialization_seed_default": {"name": "materialization_seed_default", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/seed.sql", "original_file_path": "macros/materializations/seeds/seed.sql", "unique_id": "macro.dbt.materialization_seed_default", "macro_sql": "{% materialization seed, default %}\n\n {%- set identifier = model['alias'] -%}\n {%- set full_refresh_mode = (should_full_refresh()) -%}\n\n {%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}\n\n {%- set exists_as_table = (old_relation is not none and old_relation.is_table) -%}\n {%- set exists_as_view = (old_relation is not none and old_relation.is_view) -%}\n\n {%- set grant_config = config.get('grants') -%}\n {%- set agate_table = load_agate_table() -%}\n -- grab current tables grants config for comparision later on\n\n {%- do store_result('agate_table', response='OK', agate_table=agate_table) -%}\n\n {{ run_hooks(pre_hooks, inside_transaction=False) }}\n\n -- `BEGIN` happens here:\n {{ run_hooks(pre_hooks, inside_transaction=True) }}\n\n -- build model\n {% set create_table_sql = \"\" %}\n {% if exists_as_view %}\n {{ exceptions.raise_compiler_error(\"Cannot seed to '{}', it is a view\".format(old_relation)) }}\n {% elif exists_as_table %}\n {% set create_table_sql = reset_csv_table(model, full_refresh_mode, old_relation, agate_table) %}\n {% else %}\n {% set create_table_sql = create_csv_table(model, agate_table) %}\n {% endif %}\n\n {% set code = 'CREATE' if full_refresh_mode else 'INSERT' %}\n {% set rows_affected = (agate_table.rows | length) %}\n {% set sql = load_csv_rows(model, agate_table) %}\n\n {% call noop_statement('main', code ~ ' ' ~ rows_affected, code, rows_affected) %}\n {{ get_csv_sql(create_table_sql, sql) }};\n {% endcall %}\n\n {% set target_relation = this.incorporate(type='table') %}\n\n {% set should_revoke = should_revoke(old_relation, full_refresh_mode) %}\n {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}\n\n {% do persist_docs(target_relation, model) %}\n\n {% if full_refresh_mode or not exists_as_table %}\n {% do create_indexes(target_relation) %}\n {% endif %}\n\n {{ run_hooks(post_hooks, inside_transaction=True) }}\n\n -- `COMMIT` happens here\n {{ adapter.commit() }}\n\n {{ run_hooks(post_hooks, inside_transaction=False) }}\n\n {{ return({'relations': [target_relation]}) }}\n\n{% endmaterialization %}", "depends_on": {"macros": ["macro.dbt.should_full_refresh", "macro.dbt.run_hooks", "macro.dbt.reset_csv_table", "macro.dbt.create_csv_table", "macro.dbt.load_csv_rows", "macro.dbt.noop_statement", "macro.dbt.get_csv_sql", "macro.dbt.should_revoke", "macro.dbt.apply_grants", "macro.dbt.persist_docs", "macro.dbt.create_indexes"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.627002, "supported_languages": ["sql"]}, "macro.dbt.create_csv_table": {"name": "create_csv_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.create_csv_table", "macro_sql": "{% macro create_csv_table(model, agate_table) -%}\n {{ adapter.dispatch('create_csv_table', 'dbt')(model, agate_table) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__create_csv_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.631079, "supported_languages": null}, "macro.dbt.default__create_csv_table": {"name": "default__create_csv_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.default__create_csv_table", "macro_sql": "{% macro default__create_csv_table(model, agate_table) %}\n {%- set column_override = model['config'].get('column_types', {}) -%}\n {%- set quote_seed_column = model['config'].get('quote_columns', None) -%}\n\n {% set sql %}\n create table {{ this.render() }} (\n {%- for col_name in agate_table.column_names -%}\n {%- set inferred_type = adapter.convert_type(agate_table, loop.index0) -%}\n {%- set type = column_override.get(col_name, inferred_type) -%}\n {%- set column_name = (col_name | string) -%}\n {{ adapter.quote_seed_column(column_name, quote_seed_column) }} {{ type }} {%- if not loop.last -%}, {%- endif -%}\n {%- endfor -%}\n )\n {% endset %}\n\n {% call statement('_') -%}\n {{ sql }}\n {%- endcall %}\n\n {{ return(sql) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.63184, "supported_languages": null}, "macro.dbt.reset_csv_table": {"name": "reset_csv_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.reset_csv_table", "macro_sql": "{% macro reset_csv_table(model, full_refresh, old_relation, agate_table) -%}\n {{ adapter.dispatch('reset_csv_table', 'dbt')(model, full_refresh, old_relation, agate_table) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__reset_csv_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.632041, "supported_languages": null}, "macro.dbt.default__reset_csv_table": {"name": "default__reset_csv_table", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.default__reset_csv_table", "macro_sql": "{% macro default__reset_csv_table(model, full_refresh, old_relation, agate_table) %}\n {% set sql = \"\" %}\n {% if full_refresh %}\n {{ adapter.drop_relation(old_relation) }}\n {% set sql = create_csv_table(model, agate_table) %}\n {% else %}\n {{ adapter.truncate_relation(old_relation) }}\n {% set sql = \"truncate table \" ~ old_relation %}\n {% endif %}\n\n {{ return(sql) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.create_csv_table"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.632442, "supported_languages": null}, "macro.dbt.get_csv_sql": {"name": "get_csv_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.get_csv_sql", "macro_sql": "{% macro get_csv_sql(create_or_truncate_sql, insert_sql) %}\n {{ adapter.dispatch('get_csv_sql', 'dbt')(create_or_truncate_sql, insert_sql) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_csv_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.632604, "supported_languages": null}, "macro.dbt.default__get_csv_sql": {"name": "default__get_csv_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.default__get_csv_sql", "macro_sql": "{% macro default__get_csv_sql(create_or_truncate_sql, insert_sql) %}\n {{ create_or_truncate_sql }};\n -- dbt seed --\n {{ insert_sql }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.632715, "supported_languages": null}, "macro.dbt.get_binding_char": {"name": "get_binding_char", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.get_binding_char", "macro_sql": "{% macro get_binding_char() -%}\n {{ adapter.dispatch('get_binding_char', 'dbt')() }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__get_binding_char"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6328309, "supported_languages": null}, "macro.dbt.default__get_binding_char": {"name": "default__get_binding_char", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.default__get_binding_char", "macro_sql": "{% macro default__get_binding_char() %}\n {{ return('%s') }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.632926, "supported_languages": null}, "macro.dbt.get_batch_size": {"name": "get_batch_size", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.get_batch_size", "macro_sql": "{% macro get_batch_size() -%}\n {{ return(adapter.dispatch('get_batch_size', 'dbt')()) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__get_batch_size"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.633058, "supported_languages": null}, "macro.dbt.default__get_batch_size": {"name": "default__get_batch_size", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.default__get_batch_size", "macro_sql": "{% macro default__get_batch_size() %}\n {{ return(10000) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.633155, "supported_languages": null}, "macro.dbt.get_seed_column_quoted_csv": {"name": "get_seed_column_quoted_csv", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.get_seed_column_quoted_csv", "macro_sql": "{% macro get_seed_column_quoted_csv(model, column_names) %}\n {%- set quote_seed_column = model['config'].get('quote_columns', None) -%}\n {% set quoted = [] %}\n {% for col in column_names -%}\n {%- do quoted.append(adapter.quote_seed_column(col, quote_seed_column)) -%}\n {%- endfor %}\n\n {%- set dest_cols_csv = quoted | join(', ') -%}\n {{ return(dest_cols_csv) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.633557, "supported_languages": null}, "macro.dbt.load_csv_rows": {"name": "load_csv_rows", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.load_csv_rows", "macro_sql": "{% macro load_csv_rows(model, agate_table) -%}\n {{ adapter.dispatch('load_csv_rows', 'dbt')(model, agate_table) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__load_csv_rows"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6337159, "supported_languages": null}, "macro.dbt.default__load_csv_rows": {"name": "default__load_csv_rows", "resource_type": "macro", "package_name": "dbt", "path": "macros/materializations/seeds/helpers.sql", "original_file_path": "macros/materializations/seeds/helpers.sql", "unique_id": "macro.dbt.default__load_csv_rows", "macro_sql": "{% macro default__load_csv_rows(model, agate_table) %}\n\n {% set batch_size = get_batch_size() %}\n\n {% set cols_sql = get_seed_column_quoted_csv(model, agate_table.column_names) %}\n {% set bindings = [] %}\n\n {% set statements = [] %}\n\n {% for chunk in agate_table.rows | batch(batch_size) %}\n {% set bindings = [] %}\n\n {% for row in chunk %}\n {% do bindings.extend(row) %}\n {% endfor %}\n\n {% set sql %}\n insert into {{ this.render() }} ({{ cols_sql }}) values\n {% for row in chunk -%}\n ({%- for column in agate_table.column_names -%}\n {{ get_binding_char() }}\n {%- if not loop.last%},{%- endif %}\n {%- endfor -%})\n {%- if not loop.last%},{%- endif %}\n {%- endfor %}\n {% endset %}\n\n {% do adapter.add_query(sql, bindings=bindings, abridge_sql_log=True) %}\n\n {% if loop.index0 == 0 %}\n {% do statements.append(sql) %}\n {% endif %}\n {% endfor %}\n\n {# Return SQL so we can render it out into the compiled files #}\n {{ return(statements[0]) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_batch_size", "macro.dbt.get_seed_column_quoted_csv", "macro.dbt.get_binding_char"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.634801, "supported_languages": null}, "macro.dbt.generate_alias_name": {"name": "generate_alias_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/get_custom_name/get_custom_alias.sql", "original_file_path": "macros/get_custom_name/get_custom_alias.sql", "unique_id": "macro.dbt.generate_alias_name", "macro_sql": "{% macro generate_alias_name(custom_alias_name=none, node=none) -%}\n {% do return(adapter.dispatch('generate_alias_name', 'dbt')(custom_alias_name, node)) %}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__generate_alias_name"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.635134, "supported_languages": null}, "macro.dbt.default__generate_alias_name": {"name": "default__generate_alias_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/get_custom_name/get_custom_alias.sql", "original_file_path": "macros/get_custom_name/get_custom_alias.sql", "unique_id": "macro.dbt.default__generate_alias_name", "macro_sql": "{% macro default__generate_alias_name(custom_alias_name=none, node=none) -%}\n\n {%- if custom_alias_name is none -%}\n\n {{ node.name }}\n\n {%- else -%}\n\n {{ custom_alias_name | trim }}\n\n {%- endif -%}\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6353269, "supported_languages": null}, "macro.dbt.generate_schema_name": {"name": "generate_schema_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/get_custom_name/get_custom_schema.sql", "original_file_path": "macros/get_custom_name/get_custom_schema.sql", "unique_id": "macro.dbt.generate_schema_name", "macro_sql": "{% macro generate_schema_name(custom_schema_name=none, node=none) -%}\n {{ return(adapter.dispatch('generate_schema_name', 'dbt')(custom_schema_name, node)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__generate_schema_name"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.635782, "supported_languages": null}, "macro.dbt.default__generate_schema_name": {"name": "default__generate_schema_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/get_custom_name/get_custom_schema.sql", "original_file_path": "macros/get_custom_name/get_custom_schema.sql", "unique_id": "macro.dbt.default__generate_schema_name", "macro_sql": "{% macro default__generate_schema_name(custom_schema_name, node) -%}\n\n {%- set default_schema = target.schema -%}\n {%- if custom_schema_name is none -%}\n\n {{ default_schema }}\n\n {%- else -%}\n\n {{ default_schema }}_{{ custom_schema_name | trim }}\n\n {%- endif -%}\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.636002, "supported_languages": null}, "macro.dbt.generate_schema_name_for_env": {"name": "generate_schema_name_for_env", "resource_type": "macro", "package_name": "dbt", "path": "macros/get_custom_name/get_custom_schema.sql", "original_file_path": "macros/get_custom_name/get_custom_schema.sql", "unique_id": "macro.dbt.generate_schema_name_for_env", "macro_sql": "{% macro generate_schema_name_for_env(custom_schema_name, node) -%}\n\n {%- set default_schema = target.schema -%}\n {%- if target.name == 'prod' and custom_schema_name is not none -%}\n\n {{ custom_schema_name | trim }}\n\n {%- else -%}\n\n {{ default_schema }}\n\n {%- endif -%}\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.636239, "supported_languages": null}, "macro.dbt.generate_database_name": {"name": "generate_database_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/get_custom_name/get_custom_database.sql", "original_file_path": "macros/get_custom_name/get_custom_database.sql", "unique_id": "macro.dbt.generate_database_name", "macro_sql": "{% macro generate_database_name(custom_database_name=none, node=none) -%}\n {% do return(adapter.dispatch('generate_database_name', 'dbt')(custom_database_name, node)) %}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__generate_database_name"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.636579, "supported_languages": null}, "macro.dbt.default__generate_database_name": {"name": "default__generate_database_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/get_custom_name/get_custom_database.sql", "original_file_path": "macros/get_custom_name/get_custom_database.sql", "unique_id": "macro.dbt.default__generate_database_name", "macro_sql": "{% macro default__generate_database_name(custom_database_name=none, node=none) -%}\n {%- set default_database = target.database -%}\n {%- if custom_database_name is none -%}\n\n {{ default_database }}\n\n {%- else -%}\n\n {{ custom_database_name }}\n\n {%- endif -%}\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.636798, "supported_languages": null}, "macro.dbt.default__test_relationships": {"name": "default__test_relationships", "resource_type": "macro", "package_name": "dbt", "path": "macros/generic_test_sql/relationships.sql", "original_file_path": "macros/generic_test_sql/relationships.sql", "unique_id": "macro.dbt.default__test_relationships", "macro_sql": "{% macro default__test_relationships(model, column_name, to, field) %}\n\nwith child as (\n select {{ column_name }} as from_field\n from {{ model }}\n where {{ column_name }} is not null\n),\n\nparent as (\n select {{ field }} as to_field\n from {{ to }}\n)\n\nselect\n from_field\n\nfrom child\nleft join parent\n on child.from_field = parent.to_field\n\nwhere parent.to_field is null\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6371, "supported_languages": null}, "macro.dbt.default__test_not_null": {"name": "default__test_not_null", "resource_type": "macro", "package_name": "dbt", "path": "macros/generic_test_sql/not_null.sql", "original_file_path": "macros/generic_test_sql/not_null.sql", "unique_id": "macro.dbt.default__test_not_null", "macro_sql": "{% macro default__test_not_null(model, column_name) %}\n\n{% set column_list = '*' if should_store_failures() else column_name %}\n\nselect {{ column_list }}\nfrom {{ model }}\nwhere {{ column_name }} is null\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.should_store_failures"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.637353, "supported_languages": null}, "macro.dbt.default__test_unique": {"name": "default__test_unique", "resource_type": "macro", "package_name": "dbt", "path": "macros/generic_test_sql/unique.sql", "original_file_path": "macros/generic_test_sql/unique.sql", "unique_id": "macro.dbt.default__test_unique", "macro_sql": "{% macro default__test_unique(model, column_name) %}\n\nselect\n {{ column_name }} as unique_field,\n count(*) as n_records\n\nfrom {{ model }}\nwhere {{ column_name }} is not null\ngroup by {{ column_name }}\nhaving count(*) > 1\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.637565, "supported_languages": null}, "macro.dbt.default__test_accepted_values": {"name": "default__test_accepted_values", "resource_type": "macro", "package_name": "dbt", "path": "macros/generic_test_sql/accepted_values.sql", "original_file_path": "macros/generic_test_sql/accepted_values.sql", "unique_id": "macro.dbt.default__test_accepted_values", "macro_sql": "{% macro default__test_accepted_values(model, column_name, values, quote=True) %}\n\nwith all_values as (\n\n select\n {{ column_name }} as value_field,\n count(*) as n_records\n\n from {{ model }}\n group by {{ column_name }}\n\n)\n\nselect *\nfrom all_values\nwhere value_field not in (\n {% for value in values -%}\n {% if quote -%}\n '{{ value }}'\n {%- else -%}\n {{ value }}\n {%- endif -%}\n {%- if not loop.last -%},{%- endif %}\n {%- endfor %}\n)\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6380422, "supported_languages": null}, "macro.dbt.statement": {"name": "statement", "resource_type": "macro", "package_name": "dbt", "path": "macros/etc/statement.sql", "original_file_path": "macros/etc/statement.sql", "unique_id": "macro.dbt.statement", "macro_sql": "\n{%- macro statement(name=None, fetch_result=False, auto_begin=True, language='sql') -%}\n {%- if execute: -%}\n {%- set compiled_code = caller() -%}\n\n {%- if name == 'main' -%}\n {{ log('Writing runtime {} for node \"{}\"'.format(language, model['unique_id'])) }}\n {{ write(compiled_code) }}\n {%- endif -%}\n {%- if language == 'sql'-%}\n {%- set res, table = adapter.execute(compiled_code, auto_begin=auto_begin, fetch=fetch_result) -%}\n {%- elif language == 'python' -%}\n {%- set res = submit_python_job(model, compiled_code) -%}\n {#-- TODO: What should table be for python models? --#}\n {%- set table = None -%}\n {%- else -%}\n {% do exceptions.raise_compiler_error(\"statement macro didn't get supported language\") %}\n {%- endif -%}\n\n {%- if name is not none -%}\n {{ store_result(name, response=res, agate_table=table) }}\n {%- endif -%}\n\n {%- endif -%}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.639293, "supported_languages": null}, "macro.dbt.noop_statement": {"name": "noop_statement", "resource_type": "macro", "package_name": "dbt", "path": "macros/etc/statement.sql", "original_file_path": "macros/etc/statement.sql", "unique_id": "macro.dbt.noop_statement", "macro_sql": "{% macro noop_statement(name=None, message=None, code=None, rows_affected=None, res=None) -%}\n {%- set sql = caller() -%}\n\n {%- if name == 'main' -%}\n {{ log('Writing runtime SQL for node \"{}\"'.format(model['unique_id'])) }}\n {{ write(sql) }}\n {%- endif -%}\n\n {%- if name is not none -%}\n {{ store_raw_result(name, message=message, code=code, rows_affected=rows_affected, agate_table=res) }}\n {%- endif -%}\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6397882, "supported_languages": null}, "macro.dbt.run_query": {"name": "run_query", "resource_type": "macro", "package_name": "dbt", "path": "macros/etc/statement.sql", "original_file_path": "macros/etc/statement.sql", "unique_id": "macro.dbt.run_query", "macro_sql": "{% macro run_query(sql) %}\n {% call statement(\"run_query_statement\", fetch_result=true, auto_begin=false) %}\n {{ sql }}\n {% endcall %}\n\n {% do return(load_result(\"run_query_statement\").table) %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.640039, "supported_languages": null}, "macro.dbt.convert_datetime": {"name": "convert_datetime", "resource_type": "macro", "package_name": "dbt", "path": "macros/etc/datetime.sql", "original_file_path": "macros/etc/datetime.sql", "unique_id": "macro.dbt.convert_datetime", "macro_sql": "{% macro convert_datetime(date_str, date_fmt) %}\n\n {% set error_msg -%}\n The provided partition date '{{ date_str }}' does not match the expected format '{{ date_fmt }}'\n {%- endset %}\n\n {% set res = try_or_compiler_error(error_msg, modules.datetime.datetime.strptime, date_str.strip(), date_fmt) %}\n {{ return(res) }}\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.641572, "supported_languages": null}, "macro.dbt.dates_in_range": {"name": "dates_in_range", "resource_type": "macro", "package_name": "dbt", "path": "macros/etc/datetime.sql", "original_file_path": "macros/etc/datetime.sql", "unique_id": "macro.dbt.dates_in_range", "macro_sql": "{% macro dates_in_range(start_date_str, end_date_str=none, in_fmt=\"%Y%m%d\", out_fmt=\"%Y%m%d\") %}\n {% set end_date_str = start_date_str if end_date_str is none else end_date_str %}\n\n {% set start_date = convert_datetime(start_date_str, in_fmt) %}\n {% set end_date = convert_datetime(end_date_str, in_fmt) %}\n\n {% set day_count = (end_date - start_date).days %}\n {% if day_count < 0 %}\n {% set msg -%}\n Partiton start date is after the end date ({{ start_date }}, {{ end_date }})\n {%- endset %}\n\n {{ exceptions.raise_compiler_error(msg, model) }}\n {% endif %}\n\n {% set date_list = [] %}\n {% for i in range(0, day_count + 1) %}\n {% set the_date = (modules.datetime.timedelta(days=i) + start_date) %}\n {% if not out_fmt %}\n {% set _ = date_list.append(the_date) %}\n {% else %}\n {% set _ = date_list.append(the_date.strftime(out_fmt)) %}\n {% endif %}\n {% endfor %}\n\n {{ return(date_list) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.convert_datetime"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.642743, "supported_languages": null}, "macro.dbt.partition_range": {"name": "partition_range", "resource_type": "macro", "package_name": "dbt", "path": "macros/etc/datetime.sql", "original_file_path": "macros/etc/datetime.sql", "unique_id": "macro.dbt.partition_range", "macro_sql": "{% macro partition_range(raw_partition_date, date_fmt='%Y%m%d') %}\n {% set partition_range = (raw_partition_date | string).split(\",\") %}\n\n {% if (partition_range | length) == 1 %}\n {% set start_date = partition_range[0] %}\n {% set end_date = none %}\n {% elif (partition_range | length) == 2 %}\n {% set start_date = partition_range[0] %}\n {% set end_date = partition_range[1] %}\n {% else %}\n {{ exceptions.raise_compiler_error(\"Invalid partition time. Expected format: {Start Date}[,{End Date}]. Got: \" ~ raw_partition_date) }}\n {% endif %}\n\n {{ return(dates_in_range(start_date, end_date, in_fmt=date_fmt)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.dates_in_range"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6433878, "supported_languages": null}, "macro.dbt.py_current_timestring": {"name": "py_current_timestring", "resource_type": "macro", "package_name": "dbt", "path": "macros/etc/datetime.sql", "original_file_path": "macros/etc/datetime.sql", "unique_id": "macro.dbt.py_current_timestring", "macro_sql": "{% macro py_current_timestring() %}\n {% set dt = modules.datetime.datetime.now() %}\n {% do return(dt.strftime(\"%Y%m%d%H%M%S%f\")) %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6435862, "supported_languages": null}, "macro.dbt.except": {"name": "except", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/except.sql", "original_file_path": "macros/utils/except.sql", "unique_id": "macro.dbt.except", "macro_sql": "{% macro except() %}\n {{ return(adapter.dispatch('except', 'dbt')()) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__except"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6437812, "supported_languages": null}, "macro.dbt.default__except": {"name": "default__except", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/except.sql", "original_file_path": "macros/utils/except.sql", "unique_id": "macro.dbt.default__except", "macro_sql": "{% macro default__except() %}\n\n except\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.643842, "supported_languages": null}, "macro.dbt.replace": {"name": "replace", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/replace.sql", "original_file_path": "macros/utils/replace.sql", "unique_id": "macro.dbt.replace", "macro_sql": "{% macro replace(field, old_chars, new_chars) -%}\n {{ return(adapter.dispatch('replace', 'dbt') (field, old_chars, new_chars)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__replace"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6441212, "supported_languages": null}, "macro.dbt.default__replace": {"name": "default__replace", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/replace.sql", "original_file_path": "macros/utils/replace.sql", "unique_id": "macro.dbt.default__replace", "macro_sql": "{% macro default__replace(field, old_chars, new_chars) %}\n\n replace(\n {{ field }},\n {{ old_chars }},\n {{ new_chars }}\n )\n\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.644257, "supported_languages": null}, "macro.dbt.concat": {"name": "concat", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/concat.sql", "original_file_path": "macros/utils/concat.sql", "unique_id": "macro.dbt.concat", "macro_sql": "{% macro concat(fields) -%}\n {{ return(adapter.dispatch('concat', 'dbt')(fields)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__concat"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.644465, "supported_languages": null}, "macro.dbt.default__concat": {"name": "default__concat", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/concat.sql", "original_file_path": "macros/utils/concat.sql", "unique_id": "macro.dbt.default__concat", "macro_sql": "{% macro default__concat(fields) -%}\n {{ fields|join(' || ') }}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.64457, "supported_languages": null}, "macro.dbt.length": {"name": "length", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/length.sql", "original_file_path": "macros/utils/length.sql", "unique_id": "macro.dbt.length", "macro_sql": "{% macro length(expression) -%}\n {{ return(adapter.dispatch('length', 'dbt') (expression)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__length"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.644785, "supported_languages": null}, "macro.dbt.default__length": {"name": "default__length", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/length.sql", "original_file_path": "macros/utils/length.sql", "unique_id": "macro.dbt.default__length", "macro_sql": "{% macro default__length(expression) %}\n\n length(\n {{ expression }}\n )\n\n{%- endmacro -%}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.644872, "supported_languages": null}, "macro.dbt.dateadd": {"name": "dateadd", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/dateadd.sql", "original_file_path": "macros/utils/dateadd.sql", "unique_id": "macro.dbt.dateadd", "macro_sql": "{% macro dateadd(datepart, interval, from_date_or_timestamp) %}\n {{ return(adapter.dispatch('dateadd', 'dbt')(datepart, interval, from_date_or_timestamp)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__dateadd"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6451578, "supported_languages": null}, "macro.dbt.default__dateadd": {"name": "default__dateadd", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/dateadd.sql", "original_file_path": "macros/utils/dateadd.sql", "unique_id": "macro.dbt.default__dateadd", "macro_sql": "{% macro default__dateadd(datepart, interval, from_date_or_timestamp) %}\n\n dateadd(\n {{ datepart }},\n {{ interval }},\n {{ from_date_or_timestamp }}\n )\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.645298, "supported_languages": null}, "macro.dbt.intersect": {"name": "intersect", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/intersect.sql", "original_file_path": "macros/utils/intersect.sql", "unique_id": "macro.dbt.intersect", "macro_sql": "{% macro intersect() %}\n {{ return(adapter.dispatch('intersect', 'dbt')()) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__intersect"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.64549, "supported_languages": null}, "macro.dbt.default__intersect": {"name": "default__intersect", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/intersect.sql", "original_file_path": "macros/utils/intersect.sql", "unique_id": "macro.dbt.default__intersect", "macro_sql": "{% macro default__intersect() %}\n\n intersect\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6455529, "supported_languages": null}, "macro.dbt.escape_single_quotes": {"name": "escape_single_quotes", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/escape_single_quotes.sql", "original_file_path": "macros/utils/escape_single_quotes.sql", "unique_id": "macro.dbt.escape_single_quotes", "macro_sql": "{% macro escape_single_quotes(expression) %}\n {{ return(adapter.dispatch('escape_single_quotes', 'dbt') (expression)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__escape_single_quotes"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.64578, "supported_languages": null}, "macro.dbt.default__escape_single_quotes": {"name": "default__escape_single_quotes", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/escape_single_quotes.sql", "original_file_path": "macros/utils/escape_single_quotes.sql", "unique_id": "macro.dbt.default__escape_single_quotes", "macro_sql": "{% macro default__escape_single_quotes(expression) -%}\n{{ expression | replace(\"'\",\"''\") }}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.645895, "supported_languages": null}, "macro.dbt.right": {"name": "right", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/right.sql", "original_file_path": "macros/utils/right.sql", "unique_id": "macro.dbt.right", "macro_sql": "{% macro right(string_text, length_expression) -%}\n {{ return(adapter.dispatch('right', 'dbt') (string_text, length_expression)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__right"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.646142, "supported_languages": null}, "macro.dbt.default__right": {"name": "default__right", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/right.sql", "original_file_path": "macros/utils/right.sql", "unique_id": "macro.dbt.default__right", "macro_sql": "{% macro default__right(string_text, length_expression) %}\n\n right(\n {{ string_text }},\n {{ length_expression }}\n )\n\n{%- endmacro -%}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.646257, "supported_languages": null}, "macro.dbt.listagg": {"name": "listagg", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/listagg.sql", "original_file_path": "macros/utils/listagg.sql", "unique_id": "macro.dbt.listagg", "macro_sql": "{% macro listagg(measure, delimiter_text=\"','\", order_by_clause=none, limit_num=none) -%}\n {{ return(adapter.dispatch('listagg', 'dbt') (measure, delimiter_text, order_by_clause, limit_num)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__listagg"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.646784, "supported_languages": null}, "macro.dbt.default__listagg": {"name": "default__listagg", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/listagg.sql", "original_file_path": "macros/utils/listagg.sql", "unique_id": "macro.dbt.default__listagg", "macro_sql": "{% macro default__listagg(measure, delimiter_text, order_by_clause, limit_num) -%}\n\n {% if limit_num -%}\n array_to_string(\n array_slice(\n array_agg(\n {{ measure }}\n ){% if order_by_clause -%}\n within group ({{ order_by_clause }})\n {%- endif %}\n ,0\n ,{{ limit_num }}\n ),\n {{ delimiter_text }}\n )\n {%- else %}\n listagg(\n {{ measure }},\n {{ delimiter_text }}\n )\n {% if order_by_clause -%}\n within group ({{ order_by_clause }})\n {%- endif %}\n {%- endif %}\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6681972, "supported_languages": null}, "macro.dbt.datediff": {"name": "datediff", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/datediff.sql", "original_file_path": "macros/utils/datediff.sql", "unique_id": "macro.dbt.datediff", "macro_sql": "{% macro datediff(first_date, second_date, datepart) %}\n {{ return(adapter.dispatch('datediff', 'dbt')(first_date, second_date, datepart)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__datediff"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.668546, "supported_languages": null}, "macro.dbt.default__datediff": {"name": "default__datediff", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/datediff.sql", "original_file_path": "macros/utils/datediff.sql", "unique_id": "macro.dbt.default__datediff", "macro_sql": "{% macro default__datediff(first_date, second_date, datepart) -%}\n\n datediff(\n {{ datepart }},\n {{ first_date }},\n {{ second_date }}\n )\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.668691, "supported_languages": null}, "macro.dbt.safe_cast": {"name": "safe_cast", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/safe_cast.sql", "original_file_path": "macros/utils/safe_cast.sql", "unique_id": "macro.dbt.safe_cast", "macro_sql": "{% macro safe_cast(field, type) %}\n {{ return(adapter.dispatch('safe_cast', 'dbt') (field, type)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__safe_cast"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.668951, "supported_languages": null}, "macro.dbt.default__safe_cast": {"name": "default__safe_cast", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/safe_cast.sql", "original_file_path": "macros/utils/safe_cast.sql", "unique_id": "macro.dbt.default__safe_cast", "macro_sql": "{% macro default__safe_cast(field, type) %}\n {# most databases don't support this function yet\n so we just need to use cast #}\n cast({{field}} as {{type}})\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6690712, "supported_languages": null}, "macro.dbt.hash": {"name": "hash", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/hash.sql", "original_file_path": "macros/utils/hash.sql", "unique_id": "macro.dbt.hash", "macro_sql": "{% macro hash(field) -%}\n {{ return(adapter.dispatch('hash', 'dbt') (field)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__hash"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.669299, "supported_languages": null}, "macro.dbt.default__hash": {"name": "default__hash", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/hash.sql", "original_file_path": "macros/utils/hash.sql", "unique_id": "macro.dbt.default__hash", "macro_sql": "{% macro default__hash(field) -%}\n md5(cast({{ field }} as {{ api.Column.translate_type('string') }}))\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.669434, "supported_languages": null}, "macro.dbt.cast_bool_to_text": {"name": "cast_bool_to_text", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/cast_bool_to_text.sql", "original_file_path": "macros/utils/cast_bool_to_text.sql", "unique_id": "macro.dbt.cast_bool_to_text", "macro_sql": "{% macro cast_bool_to_text(field) %}\n {{ adapter.dispatch('cast_bool_to_text', 'dbt') (field) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__cast_bool_to_text"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6696472, "supported_languages": null}, "macro.dbt.default__cast_bool_to_text": {"name": "default__cast_bool_to_text", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/cast_bool_to_text.sql", "original_file_path": "macros/utils/cast_bool_to_text.sql", "unique_id": "macro.dbt.default__cast_bool_to_text", "macro_sql": "{% macro default__cast_bool_to_text(field) %}\n cast({{ field }} as {{ api.Column.translate_type('string') }})\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.669782, "supported_languages": null}, "macro.dbt.any_value": {"name": "any_value", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/any_value.sql", "original_file_path": "macros/utils/any_value.sql", "unique_id": "macro.dbt.any_value", "macro_sql": "{% macro any_value(expression) -%}\n {{ return(adapter.dispatch('any_value', 'dbt') (expression)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__any_value"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.669991, "supported_languages": null}, "macro.dbt.default__any_value": {"name": "default__any_value", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/any_value.sql", "original_file_path": "macros/utils/any_value.sql", "unique_id": "macro.dbt.default__any_value", "macro_sql": "{% macro default__any_value(expression) -%}\n\n any_value({{ expression }})\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.670079, "supported_languages": null}, "macro.dbt.position": {"name": "position", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/position.sql", "original_file_path": "macros/utils/position.sql", "unique_id": "macro.dbt.position", "macro_sql": "{% macro position(substring_text, string_text) -%}\n {{ return(adapter.dispatch('position', 'dbt') (substring_text, string_text)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__position"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6703281, "supported_languages": null}, "macro.dbt.default__position": {"name": "default__position", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/position.sql", "original_file_path": "macros/utils/position.sql", "unique_id": "macro.dbt.default__position", "macro_sql": "{% macro default__position(substring_text, string_text) %}\n\n position(\n {{ substring_text }} in {{ string_text }}\n )\n\n{%- endmacro -%}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.670441, "supported_languages": null}, "macro.dbt.string_literal": {"name": "string_literal", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/literal.sql", "original_file_path": "macros/utils/literal.sql", "unique_id": "macro.dbt.string_literal", "macro_sql": "{%- macro string_literal(value) -%}\n {{ return(adapter.dispatch('string_literal', 'dbt') (value)) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__string_literal"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.670652, "supported_languages": null}, "macro.dbt.default__string_literal": {"name": "default__string_literal", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/literal.sql", "original_file_path": "macros/utils/literal.sql", "unique_id": "macro.dbt.default__string_literal", "macro_sql": "{% macro default__string_literal(value) -%}\n '{{ value }}'\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.670739, "supported_languages": null}, "macro.dbt.type_string": {"name": "type_string", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.type_string", "macro_sql": "\n\n{%- macro type_string() -%}\n {{ return(adapter.dispatch('type_string', 'dbt')()) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__type_string"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.671568, "supported_languages": null}, "macro.dbt.default__type_string": {"name": "default__type_string", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.default__type_string", "macro_sql": "{% macro default__type_string() %}\n {{ return(api.Column.translate_type(\"string\")) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6716979, "supported_languages": null}, "macro.dbt.type_timestamp": {"name": "type_timestamp", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.type_timestamp", "macro_sql": "\n\n{%- macro type_timestamp() -%}\n {{ return(adapter.dispatch('type_timestamp', 'dbt')()) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__type_timestamp"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.671838, "supported_languages": null}, "macro.dbt.default__type_timestamp": {"name": "default__type_timestamp", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.default__type_timestamp", "macro_sql": "{% macro default__type_timestamp() %}\n {{ return(api.Column.translate_type(\"timestamp\")) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6720278, "supported_languages": null}, "macro.dbt.type_float": {"name": "type_float", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.type_float", "macro_sql": "\n\n{%- macro type_float() -%}\n {{ return(adapter.dispatch('type_float', 'dbt')()) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__type_float"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6721652, "supported_languages": null}, "macro.dbt.default__type_float": {"name": "default__type_float", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.default__type_float", "macro_sql": "{% macro default__type_float() %}\n {{ return(api.Column.translate_type(\"float\")) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.672292, "supported_languages": null}, "macro.dbt.type_numeric": {"name": "type_numeric", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.type_numeric", "macro_sql": "\n\n{%- macro type_numeric() -%}\n {{ return(adapter.dispatch('type_numeric', 'dbt')()) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__type_numeric"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.672431, "supported_languages": null}, "macro.dbt.default__type_numeric": {"name": "default__type_numeric", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.default__type_numeric", "macro_sql": "{% macro default__type_numeric() %}\n {{ return(api.Column.numeric_type(\"numeric\", 28, 6)) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.672582, "supported_languages": null}, "macro.dbt.type_bigint": {"name": "type_bigint", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.type_bigint", "macro_sql": "\n\n{%- macro type_bigint() -%}\n {{ return(adapter.dispatch('type_bigint', 'dbt')()) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__type_bigint"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.672719, "supported_languages": null}, "macro.dbt.default__type_bigint": {"name": "default__type_bigint", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.default__type_bigint", "macro_sql": "{% macro default__type_bigint() %}\n {{ return(api.Column.translate_type(\"bigint\")) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.672846, "supported_languages": null}, "macro.dbt.type_int": {"name": "type_int", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.type_int", "macro_sql": "\n\n{%- macro type_int() -%}\n {{ return(adapter.dispatch('type_int', 'dbt')()) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__type_int"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.672983, "supported_languages": null}, "macro.dbt.default__type_int": {"name": "default__type_int", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.default__type_int", "macro_sql": "{%- macro default__type_int() -%}\n {{ return(api.Column.translate_type(\"integer\")) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.673106, "supported_languages": null}, "macro.dbt.type_boolean": {"name": "type_boolean", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.type_boolean", "macro_sql": "\n\n{%- macro type_boolean() -%}\n {{ return(adapter.dispatch('type_boolean', 'dbt')()) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.default__type_boolean"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.673244, "supported_languages": null}, "macro.dbt.default__type_boolean": {"name": "default__type_boolean", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/data_types.sql", "original_file_path": "macros/utils/data_types.sql", "unique_id": "macro.dbt.default__type_boolean", "macro_sql": "{%- macro default__type_boolean() -%}\n {{ return(api.Column.translate_type(\"boolean\")) }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.673367, "supported_languages": null}, "macro.dbt.array_concat": {"name": "array_concat", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/array_concat.sql", "original_file_path": "macros/utils/array_concat.sql", "unique_id": "macro.dbt.array_concat", "macro_sql": "{% macro array_concat(array_1, array_2) -%}\n {{ return(adapter.dispatch('array_concat', 'dbt')(array_1, array_2)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__array_concat"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.673606, "supported_languages": null}, "macro.dbt.default__array_concat": {"name": "default__array_concat", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/array_concat.sql", "original_file_path": "macros/utils/array_concat.sql", "unique_id": "macro.dbt.default__array_concat", "macro_sql": "{% macro default__array_concat(array_1, array_2) -%}\n array_cat({{ array_1 }}, {{ array_2 }})\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.673718, "supported_languages": null}, "macro.dbt.bool_or": {"name": "bool_or", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/bool_or.sql", "original_file_path": "macros/utils/bool_or.sql", "unique_id": "macro.dbt.bool_or", "macro_sql": "{% macro bool_or(expression) -%}\n {{ return(adapter.dispatch('bool_or', 'dbt') (expression)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__bool_or"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.673928, "supported_languages": null}, "macro.dbt.default__bool_or": {"name": "default__bool_or", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/bool_or.sql", "original_file_path": "macros/utils/bool_or.sql", "unique_id": "macro.dbt.default__bool_or", "macro_sql": "{% macro default__bool_or(expression) -%}\n\n bool_or({{ expression }})\n\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.674016, "supported_languages": null}, "macro.dbt.last_day": {"name": "last_day", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/last_day.sql", "original_file_path": "macros/utils/last_day.sql", "unique_id": "macro.dbt.last_day", "macro_sql": "{% macro last_day(date, datepart) %}\n {{ return(adapter.dispatch('last_day', 'dbt') (date, datepart)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__last_day"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.674309, "supported_languages": null}, "macro.dbt.default_last_day": {"name": "default_last_day", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/last_day.sql", "original_file_path": "macros/utils/last_day.sql", "unique_id": "macro.dbt.default_last_day", "macro_sql": "\n\n{%- macro default_last_day(date, datepart) -%}\n cast(\n {{dbt.dateadd('day', '-1',\n dbt.dateadd(datepart, '1', dbt.date_trunc(datepart, date))\n )}}\n as date)\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt.dateadd", "macro.dbt.date_trunc"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.674582, "supported_languages": null}, "macro.dbt.default__last_day": {"name": "default__last_day", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/last_day.sql", "original_file_path": "macros/utils/last_day.sql", "unique_id": "macro.dbt.default__last_day", "macro_sql": "{% macro default__last_day(date, datepart) -%}\n {{dbt.default_last_day(date, datepart)}}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default_last_day"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.674709, "supported_languages": null}, "macro.dbt.split_part": {"name": "split_part", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/split_part.sql", "original_file_path": "macros/utils/split_part.sql", "unique_id": "macro.dbt.split_part", "macro_sql": "{% macro split_part(string_text, delimiter_text, part_number) %}\n {{ return(adapter.dispatch('split_part', 'dbt') (string_text, delimiter_text, part_number)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__split_part"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.675154, "supported_languages": null}, "macro.dbt.default__split_part": {"name": "default__split_part", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/split_part.sql", "original_file_path": "macros/utils/split_part.sql", "unique_id": "macro.dbt.default__split_part", "macro_sql": "{% macro default__split_part(string_text, delimiter_text, part_number) %}\n\n split_part(\n {{ string_text }},\n {{ delimiter_text }},\n {{ part_number }}\n )\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.675293, "supported_languages": null}, "macro.dbt._split_part_negative": {"name": "_split_part_negative", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/split_part.sql", "original_file_path": "macros/utils/split_part.sql", "unique_id": "macro.dbt._split_part_negative", "macro_sql": "{% macro _split_part_negative(string_text, delimiter_text, part_number) %}\n\n split_part(\n {{ string_text }},\n {{ delimiter_text }},\n length({{ string_text }})\n - length(\n replace({{ string_text }}, {{ delimiter_text }}, '')\n ) + 2 {{ part_number }}\n )\n\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.67554, "supported_languages": null}, "macro.dbt.date_trunc": {"name": "date_trunc", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/date_trunc.sql", "original_file_path": "macros/utils/date_trunc.sql", "unique_id": "macro.dbt.date_trunc", "macro_sql": "{% macro date_trunc(datepart, date) -%}\n {{ return(adapter.dispatch('date_trunc', 'dbt') (datepart, date)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__date_trunc"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.675781, "supported_languages": null}, "macro.dbt.default__date_trunc": {"name": "default__date_trunc", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/date_trunc.sql", "original_file_path": "macros/utils/date_trunc.sql", "unique_id": "macro.dbt.default__date_trunc", "macro_sql": "{% macro default__date_trunc(datepart, date) -%}\n date_trunc('{{datepart}}', {{date}})\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.67589, "supported_languages": null}, "macro.dbt.array_construct": {"name": "array_construct", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/array_construct.sql", "original_file_path": "macros/utils/array_construct.sql", "unique_id": "macro.dbt.array_construct", "macro_sql": "{% macro array_construct(inputs=[], data_type=api.Column.translate_type('integer')) -%}\n {{ return(adapter.dispatch('array_construct', 'dbt')(inputs, data_type)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__array_construct"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.676214, "supported_languages": null}, "macro.dbt.default__array_construct": {"name": "default__array_construct", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/array_construct.sql", "original_file_path": "macros/utils/array_construct.sql", "unique_id": "macro.dbt.default__array_construct", "macro_sql": "{% macro default__array_construct(inputs, data_type) -%}\n {% if inputs|length > 0 %}\n array[ {{ inputs|join(' , ') }} ]\n {% else %}\n array[]::{{data_type}}[]\n {% endif %}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.676425, "supported_languages": null}, "macro.dbt.array_append": {"name": "array_append", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/array_append.sql", "original_file_path": "macros/utils/array_append.sql", "unique_id": "macro.dbt.array_append", "macro_sql": "{% macro array_append(array, new_element) -%}\n {{ return(adapter.dispatch('array_append', 'dbt')(array, new_element)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__array_append"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.676668, "supported_languages": null}, "macro.dbt.default__array_append": {"name": "default__array_append", "resource_type": "macro", "package_name": "dbt", "path": "macros/utils/array_append.sql", "original_file_path": "macros/utils/array_append.sql", "unique_id": "macro.dbt.default__array_append", "macro_sql": "{% macro default__array_append(array, new_element) -%}\n array_append({{ array }}, {{ new_element }})\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.676778, "supported_languages": null}, "macro.dbt.create_schema": {"name": "create_schema", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/schema.sql", "original_file_path": "macros/adapters/schema.sql", "unique_id": "macro.dbt.create_schema", "macro_sql": "{% macro create_schema(relation) -%}\n {{ adapter.dispatch('create_schema', 'dbt')(relation) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__create_schema"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6771042, "supported_languages": null}, "macro.dbt.default__create_schema": {"name": "default__create_schema", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/schema.sql", "original_file_path": "macros/adapters/schema.sql", "unique_id": "macro.dbt.default__create_schema", "macro_sql": "{% macro default__create_schema(relation) -%}\n {%- call statement('create_schema') -%}\n create schema if not exists {{ relation.without_identifier() }}\n {% endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.677264, "supported_languages": null}, "macro.dbt.drop_schema": {"name": "drop_schema", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/schema.sql", "original_file_path": "macros/adapters/schema.sql", "unique_id": "macro.dbt.drop_schema", "macro_sql": "{% macro drop_schema(relation) -%}\n {{ adapter.dispatch('drop_schema', 'dbt')(relation) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__drop_schema"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.677402, "supported_languages": null}, "macro.dbt.default__drop_schema": {"name": "default__drop_schema", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/schema.sql", "original_file_path": "macros/adapters/schema.sql", "unique_id": "macro.dbt.default__drop_schema", "macro_sql": "{% macro default__drop_schema(relation) -%}\n {%- call statement('drop_schema') -%}\n drop schema if exists {{ relation.without_identifier() }} cascade\n {% endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.677574, "supported_languages": null}, "macro.dbt.current_timestamp": {"name": "current_timestamp", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.current_timestamp", "macro_sql": "{%- macro current_timestamp() -%}\n {{ adapter.dispatch('current_timestamp', 'dbt')() }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__current_timestamp"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.677997, "supported_languages": null}, "macro.dbt.default__current_timestamp": {"name": "default__current_timestamp", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.default__current_timestamp", "macro_sql": "{% macro default__current_timestamp() -%}\n {{ exceptions.raise_not_implemented(\n 'current_timestamp macro not implemented for adapter ' + adapter.type()) }}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6781251, "supported_languages": null}, "macro.dbt.snapshot_get_time": {"name": "snapshot_get_time", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.snapshot_get_time", "macro_sql": "\n\n{%- macro snapshot_get_time() -%}\n {{ adapter.dispatch('snapshot_get_time', 'dbt')() }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__snapshot_get_time"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.678248, "supported_languages": null}, "macro.dbt.default__snapshot_get_time": {"name": "default__snapshot_get_time", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.default__snapshot_get_time", "macro_sql": "{% macro default__snapshot_get_time() %}\n {{ current_timestamp() }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.current_timestamp"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.678338, "supported_languages": null}, "macro.dbt.current_timestamp_backcompat": {"name": "current_timestamp_backcompat", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.current_timestamp_backcompat", "macro_sql": "{% macro current_timestamp_backcompat() %}\n {{ return(adapter.dispatch('current_timestamp_backcompat', 'dbt')()) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__current_timestamp_backcompat"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.678478, "supported_languages": null}, "macro.dbt.default__current_timestamp_backcompat": {"name": "default__current_timestamp_backcompat", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.default__current_timestamp_backcompat", "macro_sql": "{% macro default__current_timestamp_backcompat() %}\n current_timestamp::timestamp\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.678539, "supported_languages": null}, "macro.dbt.current_timestamp_in_utc_backcompat": {"name": "current_timestamp_in_utc_backcompat", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.current_timestamp_in_utc_backcompat", "macro_sql": "{% macro current_timestamp_in_utc_backcompat() %}\n {{ return(adapter.dispatch('current_timestamp_in_utc_backcompat', 'dbt')()) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__current_timestamp_in_utc_backcompat"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6786802, "supported_languages": null}, "macro.dbt.default__current_timestamp_in_utc_backcompat": {"name": "default__current_timestamp_in_utc_backcompat", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/timestamps.sql", "original_file_path": "macros/adapters/timestamps.sql", "unique_id": "macro.dbt.default__current_timestamp_in_utc_backcompat", "macro_sql": "{% macro default__current_timestamp_in_utc_backcompat() %}\n {{ return(adapter.dispatch('current_timestamp_backcompat', 'dbt')()) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.current_timestamp_backcompat", "macro.dbt.default__current_timestamp_backcompat"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.678824, "supported_languages": null}, "macro.dbt.get_create_index_sql": {"name": "get_create_index_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/indexes.sql", "original_file_path": "macros/adapters/indexes.sql", "unique_id": "macro.dbt.get_create_index_sql", "macro_sql": "{% macro get_create_index_sql(relation, index_dict) -%}\n {{ return(adapter.dispatch('get_create_index_sql', 'dbt')(relation, index_dict)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_create_index_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6792178, "supported_languages": null}, "macro.dbt.default__get_create_index_sql": {"name": "default__get_create_index_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/indexes.sql", "original_file_path": "macros/adapters/indexes.sql", "unique_id": "macro.dbt.default__get_create_index_sql", "macro_sql": "{% macro default__get_create_index_sql(relation, index_dict) -%}\n {% do return(None) %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.679399, "supported_languages": null}, "macro.dbt.create_indexes": {"name": "create_indexes", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/indexes.sql", "original_file_path": "macros/adapters/indexes.sql", "unique_id": "macro.dbt.create_indexes", "macro_sql": "{% macro create_indexes(relation) -%}\n {{ adapter.dispatch('create_indexes', 'dbt')(relation) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.default__create_indexes"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.679533, "supported_languages": null}, "macro.dbt.default__create_indexes": {"name": "default__create_indexes", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/indexes.sql", "original_file_path": "macros/adapters/indexes.sql", "unique_id": "macro.dbt.default__create_indexes", "macro_sql": "{% macro default__create_indexes(relation) -%}\n {%- set _indexes = config.get('indexes', default=[]) -%}\n\n {% for _index_dict in _indexes %}\n {% set create_index_sql = get_create_index_sql(relation, _index_dict) %}\n {% if create_index_sql %}\n {% do run_query(create_index_sql) %}\n {% endif %}\n {% endfor %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.get_create_index_sql", "macro.dbt.run_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.679878, "supported_languages": null}, "macro.dbt.make_intermediate_relation": {"name": "make_intermediate_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.make_intermediate_relation", "macro_sql": "{% macro make_intermediate_relation(base_relation, suffix='__dbt_tmp') %}\n {{ return(adapter.dispatch('make_intermediate_relation', 'dbt')(base_relation, suffix)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__make_intermediate_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.682687, "supported_languages": null}, "macro.dbt.default__make_intermediate_relation": {"name": "default__make_intermediate_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.default__make_intermediate_relation", "macro_sql": "{% macro default__make_intermediate_relation(base_relation, suffix) %}\n {{ return(default__make_temp_relation(base_relation, suffix)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__make_temp_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.682827, "supported_languages": null}, "macro.dbt.make_temp_relation": {"name": "make_temp_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.make_temp_relation", "macro_sql": "{% macro make_temp_relation(base_relation, suffix='__dbt_tmp') %}\n {{ return(adapter.dispatch('make_temp_relation', 'dbt')(base_relation, suffix)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__make_temp_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.683017, "supported_languages": null}, "macro.dbt.default__make_temp_relation": {"name": "default__make_temp_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.default__make_temp_relation", "macro_sql": "{% macro default__make_temp_relation(base_relation, suffix) %}\n {%- set temp_identifier = base_relation.identifier ~ suffix -%}\n {%- set temp_relation = base_relation.incorporate(\n path={\"identifier\": temp_identifier}) -%}\n\n {{ return(temp_relation) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.683267, "supported_languages": null}, "macro.dbt.make_backup_relation": {"name": "make_backup_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.make_backup_relation", "macro_sql": "{% macro make_backup_relation(base_relation, backup_relation_type, suffix='__dbt_backup') %}\n {{ return(adapter.dispatch('make_backup_relation', 'dbt')(base_relation, backup_relation_type, suffix)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__make_backup_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.68348, "supported_languages": null}, "macro.dbt.default__make_backup_relation": {"name": "default__make_backup_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.default__make_backup_relation", "macro_sql": "{% macro default__make_backup_relation(base_relation, backup_relation_type, suffix) %}\n {%- set backup_identifier = base_relation.identifier ~ suffix -%}\n {%- set backup_relation = base_relation.incorporate(\n path={\"identifier\": backup_identifier},\n type=backup_relation_type\n ) -%}\n {{ return(backup_relation) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.683755, "supported_languages": null}, "macro.dbt.drop_relation": {"name": "drop_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.drop_relation", "macro_sql": "{% macro drop_relation(relation) -%}\n {{ return(adapter.dispatch('drop_relation', 'dbt')(relation)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__drop_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.683913, "supported_languages": null}, "macro.dbt.default__drop_relation": {"name": "default__drop_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.default__drop_relation", "macro_sql": "{% macro default__drop_relation(relation) -%}\n {% call statement('drop_relation', auto_begin=False) -%}\n drop {{ relation.type }} if exists {{ relation }} cascade\n {%- endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.684098, "supported_languages": null}, "macro.dbt.truncate_relation": {"name": "truncate_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.truncate_relation", "macro_sql": "{% macro truncate_relation(relation) -%}\n {{ return(adapter.dispatch('truncate_relation', 'dbt')(relation)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__truncate_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.684253, "supported_languages": null}, "macro.dbt.default__truncate_relation": {"name": "default__truncate_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.default__truncate_relation", "macro_sql": "{% macro default__truncate_relation(relation) -%}\n {% call statement('truncate_relation') -%}\n truncate table {{ relation }}\n {%- endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6843882, "supported_languages": null}, "macro.dbt.rename_relation": {"name": "rename_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.rename_relation", "macro_sql": "{% macro rename_relation(from_relation, to_relation) -%}\n {{ return(adapter.dispatch('rename_relation', 'dbt')(from_relation, to_relation)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__rename_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.684561, "supported_languages": null}, "macro.dbt.default__rename_relation": {"name": "default__rename_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.default__rename_relation", "macro_sql": "{% macro default__rename_relation(from_relation, to_relation) -%}\n {% set target_name = adapter.quote_as_configured(to_relation.identifier, 'identifier') %}\n {% call statement('rename_relation') -%}\n alter table {{ from_relation }} rename to {{ target_name }}\n {%- endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.684798, "supported_languages": null}, "macro.dbt.get_or_create_relation": {"name": "get_or_create_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.get_or_create_relation", "macro_sql": "{% macro get_or_create_relation(database, schema, identifier, type) -%}\n {{ return(adapter.dispatch('get_or_create_relation', 'dbt')(database, schema, identifier, type)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_or_create_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.685013, "supported_languages": null}, "macro.dbt.default__get_or_create_relation": {"name": "default__get_or_create_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.default__get_or_create_relation", "macro_sql": "{% macro default__get_or_create_relation(database, schema, identifier, type) %}\n {%- set target_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) %}\n\n {% if target_relation %}\n {% do return([true, target_relation]) %}\n {% endif %}\n\n {%- set new_relation = api.Relation.create(\n database=database,\n schema=schema,\n identifier=identifier,\n type=type\n ) -%}\n {% do return([false, new_relation]) %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.685487, "supported_languages": null}, "macro.dbt.load_cached_relation": {"name": "load_cached_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.load_cached_relation", "macro_sql": "{% macro load_cached_relation(relation) %}\n {% do return(adapter.get_relation(\n database=relation.database,\n schema=relation.schema,\n identifier=relation.identifier\n )) -%}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.68575, "supported_languages": null}, "macro.dbt.load_relation": {"name": "load_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.load_relation", "macro_sql": "{% macro load_relation(relation) %}\n {{ return(load_cached_relation(relation)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.load_cached_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6858728, "supported_languages": null}, "macro.dbt.drop_relation_if_exists": {"name": "drop_relation_if_exists", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/relation.sql", "original_file_path": "macros/adapters/relation.sql", "unique_id": "macro.dbt.drop_relation_if_exists", "macro_sql": "{% macro drop_relation_if_exists(relation) %}\n {% if relation is not none %}\n {{ adapter.drop_relation(relation) }}\n {% endif %}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6860359, "supported_languages": null}, "macro.dbt.collect_freshness": {"name": "collect_freshness", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/freshness.sql", "original_file_path": "macros/adapters/freshness.sql", "unique_id": "macro.dbt.collect_freshness", "macro_sql": "{% macro collect_freshness(source, loaded_at_field, filter) %}\n {{ return(adapter.dispatch('collect_freshness', 'dbt')(source, loaded_at_field, filter))}}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__collect_freshness"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.686399, "supported_languages": null}, "macro.dbt.default__collect_freshness": {"name": "default__collect_freshness", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/freshness.sql", "original_file_path": "macros/adapters/freshness.sql", "unique_id": "macro.dbt.default__collect_freshness", "macro_sql": "{% macro default__collect_freshness(source, loaded_at_field, filter) %}\n {% call statement('collect_freshness', fetch_result=True, auto_begin=False) -%}\n select\n max({{ loaded_at_field }}) as max_loaded_at,\n {{ current_timestamp() }} as snapshotted_at\n from {{ source }}\n {% if filter %}\n where {{ filter }}\n {% endif %}\n {% endcall %}\n {{ return(load_result('collect_freshness').table) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement", "macro.dbt.current_timestamp"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.686757, "supported_languages": null}, "macro.dbt.copy_grants": {"name": "copy_grants", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.copy_grants", "macro_sql": "{% macro copy_grants() %}\n {{ return(adapter.dispatch('copy_grants', 'dbt')()) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__copy_grants"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.688231, "supported_languages": null}, "macro.dbt.default__copy_grants": {"name": "default__copy_grants", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__copy_grants", "macro_sql": "{% macro default__copy_grants() %}\n {{ return(True) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.688329, "supported_languages": null}, "macro.dbt.support_multiple_grantees_per_dcl_statement": {"name": "support_multiple_grantees_per_dcl_statement", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.support_multiple_grantees_per_dcl_statement", "macro_sql": "{% macro support_multiple_grantees_per_dcl_statement() %}\n {{ return(adapter.dispatch('support_multiple_grantees_per_dcl_statement', 'dbt')()) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__support_multiple_grantees_per_dcl_statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.688475, "supported_languages": null}, "macro.dbt.default__support_multiple_grantees_per_dcl_statement": {"name": "default__support_multiple_grantees_per_dcl_statement", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__support_multiple_grantees_per_dcl_statement", "macro_sql": "\n\n{%- macro default__support_multiple_grantees_per_dcl_statement() -%}\n {{ return(True) }}\n{%- endmacro -%}\n\n\n", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.688568, "supported_languages": null}, "macro.dbt.should_revoke": {"name": "should_revoke", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.should_revoke", "macro_sql": "{% macro should_revoke(existing_relation, full_refresh_mode=True) %}\n\n {% if not existing_relation %}\n {#-- The table doesn't already exist, so no grants to copy over --#}\n {{ return(False) }}\n {% elif full_refresh_mode %}\n {#-- The object is being REPLACED -- whether grants are copied over depends on the value of user config --#}\n {{ return(copy_grants()) }}\n {% else %}\n {#-- The table is being merged/upserted/inserted -- grants will be carried over --#}\n {{ return(True) }}\n {% endif %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.copy_grants"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.688861, "supported_languages": null}, "macro.dbt.get_show_grant_sql": {"name": "get_show_grant_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.get_show_grant_sql", "macro_sql": "{% macro get_show_grant_sql(relation) %}\n {{ return(adapter.dispatch(\"get_show_grant_sql\", \"dbt\")(relation)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_show_grant_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.689023, "supported_languages": null}, "macro.dbt.default__get_show_grant_sql": {"name": "default__get_show_grant_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__get_show_grant_sql", "macro_sql": "{% macro default__get_show_grant_sql(relation) %}\n show grants on {{ relation }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.689109, "supported_languages": null}, "macro.dbt.get_grant_sql": {"name": "get_grant_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.get_grant_sql", "macro_sql": "{% macro get_grant_sql(relation, privilege, grantees) %}\n {{ return(adapter.dispatch('get_grant_sql', 'dbt')(relation, privilege, grantees)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_grant_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.689307, "supported_languages": null}, "macro.dbt.default__get_grant_sql": {"name": "default__get_grant_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__get_grant_sql", "macro_sql": "\n\n{%- macro default__get_grant_sql(relation, privilege, grantees) -%}\n grant {{ privilege }} on {{ relation }} to {{ grantees | join(', ') }}\n{%- endmacro -%}\n\n\n", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.689466, "supported_languages": null}, "macro.dbt.get_revoke_sql": {"name": "get_revoke_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.get_revoke_sql", "macro_sql": "{% macro get_revoke_sql(relation, privilege, grantees) %}\n {{ return(adapter.dispatch('get_revoke_sql', 'dbt')(relation, privilege, grantees)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_revoke_sql"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.689657, "supported_languages": null}, "macro.dbt.default__get_revoke_sql": {"name": "default__get_revoke_sql", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__get_revoke_sql", "macro_sql": "\n\n{%- macro default__get_revoke_sql(relation, privilege, grantees) -%}\n revoke {{ privilege }} on {{ relation }} from {{ grantees | join(', ') }}\n{%- endmacro -%}\n\n\n", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6898172, "supported_languages": null}, "macro.dbt.get_dcl_statement_list": {"name": "get_dcl_statement_list", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.get_dcl_statement_list", "macro_sql": "{% macro get_dcl_statement_list(relation, grant_config, get_dcl_macro) %}\n {{ return(adapter.dispatch('get_dcl_statement_list', 'dbt')(relation, grant_config, get_dcl_macro)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_dcl_statement_list"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.690012, "supported_languages": null}, "macro.dbt.default__get_dcl_statement_list": {"name": "default__get_dcl_statement_list", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__get_dcl_statement_list", "macro_sql": "\n\n{%- macro default__get_dcl_statement_list(relation, grant_config, get_dcl_macro) -%}\n {#\n -- Unpack grant_config into specific privileges and the set of users who need them granted/revoked.\n -- Depending on whether this database supports multiple grantees per statement, pass in the list of\n -- all grantees per privilege, or (if not) template one statement per privilege-grantee pair.\n -- `get_dcl_macro` will be either `get_grant_sql` or `get_revoke_sql`\n #}\n {%- set dcl_statements = [] -%}\n {%- for privilege, grantees in grant_config.items() %}\n {%- if support_multiple_grantees_per_dcl_statement() and grantees -%}\n {%- set dcl = get_dcl_macro(relation, privilege, grantees) -%}\n {%- do dcl_statements.append(dcl) -%}\n {%- else -%}\n {%- for grantee in grantees -%}\n {% set dcl = get_dcl_macro(relation, privilege, [grantee]) %}\n {%- do dcl_statements.append(dcl) -%}\n {% endfor -%}\n {%- endif -%}\n {%- endfor -%}\n {{ return(dcl_statements) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt.support_multiple_grantees_per_dcl_statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.690586, "supported_languages": null}, "macro.dbt.call_dcl_statements": {"name": "call_dcl_statements", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.call_dcl_statements", "macro_sql": "{% macro call_dcl_statements(dcl_statement_list) %}\n {{ return(adapter.dispatch(\"call_dcl_statements\", \"dbt\")(dcl_statement_list)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__call_dcl_statements"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.690748, "supported_languages": null}, "macro.dbt.default__call_dcl_statements": {"name": "default__call_dcl_statements", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__call_dcl_statements", "macro_sql": "{% macro default__call_dcl_statements(dcl_statement_list) %}\n {#\n -- By default, supply all grant + revoke statements in a single semicolon-separated block,\n -- so that they're all processed together.\n\n -- Some databases do not support this. Those adapters will need to override this macro\n -- to run each statement individually.\n #}\n {% call statement('grants') %}\n {% for dcl_statement in dcl_statement_list %}\n {{ dcl_statement }};\n {% endfor %}\n {% endcall %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.690955, "supported_languages": null}, "macro.dbt.apply_grants": {"name": "apply_grants", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.apply_grants", "macro_sql": "{% macro apply_grants(relation, grant_config, should_revoke) %}\n {{ return(adapter.dispatch(\"apply_grants\", \"dbt\")(relation, grant_config, should_revoke)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__apply_grants"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.691225, "supported_languages": null}, "macro.dbt.default__apply_grants": {"name": "default__apply_grants", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/apply_grants.sql", "original_file_path": "macros/adapters/apply_grants.sql", "unique_id": "macro.dbt.default__apply_grants", "macro_sql": "{% macro default__apply_grants(relation, grant_config, should_revoke=True) %}\n {#-- If grant_config is {} or None, this is a no-op --#}\n {% if grant_config %}\n {% if should_revoke %}\n {#-- We think previous grants may have carried over --#}\n {#-- Show current grants and calculate diffs --#}\n {% set current_grants_table = run_query(get_show_grant_sql(relation)) %}\n {% set current_grants_dict = adapter.standardize_grants_dict(current_grants_table) %}\n {% set needs_granting = diff_of_two_dicts(grant_config, current_grants_dict) %}\n {% set needs_revoking = diff_of_two_dicts(current_grants_dict, grant_config) %}\n {% if not (needs_granting or needs_revoking) %}\n {{ log('On ' ~ relation ~': All grants are in place, no revocation or granting needed.')}}\n {% endif %}\n {% else %}\n {#-- We don't think there's any chance of previous grants having carried over. --#}\n {#-- Jump straight to granting what the user has configured. --#}\n {% set needs_revoking = {} %}\n {% set needs_granting = grant_config %}\n {% endif %}\n {% if needs_granting or needs_revoking %}\n {% set revoke_statement_list = get_dcl_statement_list(relation, needs_revoking, get_revoke_sql) %}\n {% set grant_statement_list = get_dcl_statement_list(relation, needs_granting, get_grant_sql) %}\n {% set dcl_statement_list = revoke_statement_list + grant_statement_list %}\n {% if dcl_statement_list %}\n {{ call_dcl_statements(dcl_statement_list) }}\n {% endif %}\n {% endif %}\n {% endif %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.run_query", "macro.dbt.get_show_grant_sql", "macro.dbt.get_dcl_statement_list", "macro.dbt.call_dcl_statements"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.692201, "supported_languages": null}, "macro.dbt.alter_column_comment": {"name": "alter_column_comment", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/persist_docs.sql", "original_file_path": "macros/adapters/persist_docs.sql", "unique_id": "macro.dbt.alter_column_comment", "macro_sql": "{% macro alter_column_comment(relation, column_dict) -%}\n {{ return(adapter.dispatch('alter_column_comment', 'dbt')(relation, column_dict)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__alter_column_comment"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.692799, "supported_languages": null}, "macro.dbt.default__alter_column_comment": {"name": "default__alter_column_comment", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/persist_docs.sql", "original_file_path": "macros/adapters/persist_docs.sql", "unique_id": "macro.dbt.default__alter_column_comment", "macro_sql": "{% macro default__alter_column_comment(relation, column_dict) -%}\n {{ exceptions.raise_not_implemented(\n 'alter_column_comment macro not implemented for adapter '+adapter.type()) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.692943, "supported_languages": null}, "macro.dbt.alter_relation_comment": {"name": "alter_relation_comment", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/persist_docs.sql", "original_file_path": "macros/adapters/persist_docs.sql", "unique_id": "macro.dbt.alter_relation_comment", "macro_sql": "{% macro alter_relation_comment(relation, relation_comment) -%}\n {{ return(adapter.dispatch('alter_relation_comment', 'dbt')(relation, relation_comment)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__alter_relation_comment"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6931179, "supported_languages": null}, "macro.dbt.default__alter_relation_comment": {"name": "default__alter_relation_comment", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/persist_docs.sql", "original_file_path": "macros/adapters/persist_docs.sql", "unique_id": "macro.dbt.default__alter_relation_comment", "macro_sql": "{% macro default__alter_relation_comment(relation, relation_comment) -%}\n {{ exceptions.raise_not_implemented(\n 'alter_relation_comment macro not implemented for adapter '+adapter.type()) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.69326, "supported_languages": null}, "macro.dbt.persist_docs": {"name": "persist_docs", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/persist_docs.sql", "original_file_path": "macros/adapters/persist_docs.sql", "unique_id": "macro.dbt.persist_docs", "macro_sql": "{% macro persist_docs(relation, model, for_relation=true, for_columns=true) -%}\n {{ return(adapter.dispatch('persist_docs', 'dbt')(relation, model, for_relation, for_columns)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__persist_docs"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.693491, "supported_languages": null}, "macro.dbt.default__persist_docs": {"name": "default__persist_docs", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/persist_docs.sql", "original_file_path": "macros/adapters/persist_docs.sql", "unique_id": "macro.dbt.default__persist_docs", "macro_sql": "{% macro default__persist_docs(relation, model, for_relation, for_columns) -%}\n {% if for_relation and config.persist_relation_docs() and model.description %}\n {% do run_query(alter_relation_comment(relation, model.description)) %}\n {% endif %}\n\n {% if for_columns and config.persist_column_docs() and model.columns %}\n {% do run_query(alter_column_comment(relation, model.columns)) %}\n {% endif %}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.run_query", "macro.dbt.alter_relation_comment", "macro.dbt.alter_column_comment"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.693917, "supported_languages": null}, "macro.dbt.get_catalog": {"name": "get_catalog", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.get_catalog", "macro_sql": "{% macro get_catalog(information_schema, schemas) -%}\n {{ return(adapter.dispatch('get_catalog', 'dbt')(information_schema, schemas)) }}\n{%- endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__get_catalog"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.695193, "supported_languages": null}, "macro.dbt.default__get_catalog": {"name": "default__get_catalog", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.default__get_catalog", "macro_sql": "{% macro default__get_catalog(information_schema, schemas) -%}\n\n {% set typename = adapter.type() %}\n {% set msg -%}\n get_catalog not implemented for {{ typename }}\n {%- endset %}\n\n {{ exceptions.raise_compiler_error(msg) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6954129, "supported_languages": null}, "macro.dbt.information_schema_name": {"name": "information_schema_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.information_schema_name", "macro_sql": "{% macro information_schema_name(database) %}\n {{ return(adapter.dispatch('information_schema_name', 'dbt')(database)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__information_schema_name"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.695571, "supported_languages": null}, "macro.dbt.default__information_schema_name": {"name": "default__information_schema_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.default__information_schema_name", "macro_sql": "{% macro default__information_schema_name(database) -%}\n {%- if database -%}\n {{ database }}.INFORMATION_SCHEMA\n {%- else -%}\n INFORMATION_SCHEMA\n {%- endif -%}\n{%- endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.695704, "supported_languages": null}, "macro.dbt.list_schemas": {"name": "list_schemas", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.list_schemas", "macro_sql": "{% macro list_schemas(database) -%}\n {{ return(adapter.dispatch('list_schemas', 'dbt')(database)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__list_schemas"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6958551, "supported_languages": null}, "macro.dbt.default__list_schemas": {"name": "default__list_schemas", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.default__list_schemas", "macro_sql": "{% macro default__list_schemas(database) -%}\n {% set sql %}\n select distinct schema_name\n from {{ information_schema_name(database) }}.SCHEMATA\n where catalog_name ilike '{{ database }}'\n {% endset %}\n {{ return(run_query(sql)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.information_schema_name", "macro.dbt.run_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.696067, "supported_languages": null}, "macro.dbt.check_schema_exists": {"name": "check_schema_exists", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.check_schema_exists", "macro_sql": "{% macro check_schema_exists(information_schema, schema) -%}\n {{ return(adapter.dispatch('check_schema_exists', 'dbt')(information_schema, schema)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__check_schema_exists"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6962419, "supported_languages": null}, "macro.dbt.default__check_schema_exists": {"name": "default__check_schema_exists", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.default__check_schema_exists", "macro_sql": "{% macro default__check_schema_exists(information_schema, schema) -%}\n {% set sql -%}\n select count(*)\n from {{ information_schema.replace(information_schema_view='SCHEMATA') }}\n where catalog_name='{{ information_schema.database }}'\n and schema_name='{{ schema }}'\n {%- endset %}\n {{ return(run_query(sql)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.replace", "macro.dbt.run_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.696498, "supported_languages": null}, "macro.dbt.list_relations_without_caching": {"name": "list_relations_without_caching", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.list_relations_without_caching", "macro_sql": "{% macro list_relations_without_caching(schema_relation) %}\n {{ return(adapter.dispatch('list_relations_without_caching', 'dbt')(schema_relation)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__list_relations_without_caching"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.696657, "supported_languages": null}, "macro.dbt.default__list_relations_without_caching": {"name": "default__list_relations_without_caching", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/metadata.sql", "original_file_path": "macros/adapters/metadata.sql", "unique_id": "macro.dbt.default__list_relations_without_caching", "macro_sql": "{% macro default__list_relations_without_caching(schema_relation) %}\n {{ exceptions.raise_not_implemented(\n 'list_relations_without_caching macro not implemented for adapter '+adapter.type()) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6967921, "supported_languages": null}, "macro.dbt.get_columns_in_relation": {"name": "get_columns_in_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.get_columns_in_relation", "macro_sql": "{% macro get_columns_in_relation(relation) -%}\n {{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt_duckdb.duckdb__get_columns_in_relation"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.698245, "supported_languages": null}, "macro.dbt.default__get_columns_in_relation": {"name": "default__get_columns_in_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.default__get_columns_in_relation", "macro_sql": "{% macro default__get_columns_in_relation(relation) -%}\n {{ exceptions.raise_not_implemented(\n 'get_columns_in_relation macro not implemented for adapter '+adapter.type()) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.698378, "supported_languages": null}, "macro.dbt.sql_convert_columns_in_relation": {"name": "sql_convert_columns_in_relation", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.sql_convert_columns_in_relation", "macro_sql": "{% macro sql_convert_columns_in_relation(table) -%}\n {% set columns = [] %}\n {% for row in table %}\n {% do columns.append(api.Column(*row)) %}\n {% endfor %}\n {{ return(columns) }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.69872, "supported_languages": null}, "macro.dbt.get_columns_in_query": {"name": "get_columns_in_query", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.get_columns_in_query", "macro_sql": "{% macro get_columns_in_query(select_sql) -%}\n {{ return(adapter.dispatch('get_columns_in_query', 'dbt')(select_sql)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__get_columns_in_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.698879, "supported_languages": null}, "macro.dbt.default__get_columns_in_query": {"name": "default__get_columns_in_query", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.default__get_columns_in_query", "macro_sql": "{% macro default__get_columns_in_query(select_sql) %}\n {% call statement('get_columns_in_query', fetch_result=True, auto_begin=False) -%}\n select * from (\n {{ select_sql }}\n ) as __dbt_sbq\n where false\n limit 0\n {% endcall %}\n\n {{ return(load_result('get_columns_in_query').table.columns | map(attribute='name') | list) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.699171, "supported_languages": null}, "macro.dbt.alter_column_type": {"name": "alter_column_type", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.alter_column_type", "macro_sql": "{% macro alter_column_type(relation, column_name, new_column_type) -%}\n {{ return(adapter.dispatch('alter_column_type', 'dbt')(relation, column_name, new_column_type)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__alter_column_type"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.699368, "supported_languages": null}, "macro.dbt.default__alter_column_type": {"name": "default__alter_column_type", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.default__alter_column_type", "macro_sql": "{% macro default__alter_column_type(relation, column_name, new_column_type) -%}\n {#\n 1. Create a new column (w/ temp name and correct type)\n 2. Copy data over to it\n 3. Drop the existing column (cascade!)\n 4. Rename the new column to existing column\n #}\n {%- set tmp_column = column_name + \"__dbt_alter\" -%}\n\n {% call statement('alter_column_type') %}\n alter table {{ relation }} add column {{ adapter.quote(tmp_column) }} {{ new_column_type }};\n update {{ relation }} set {{ adapter.quote(tmp_column) }} = {{ adapter.quote(column_name) }};\n alter table {{ relation }} drop column {{ adapter.quote(column_name) }} cascade;\n alter table {{ relation }} rename column {{ adapter.quote(tmp_column) }} to {{ adapter.quote(column_name) }}\n {% endcall %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.statement"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.6998851, "supported_languages": null}, "macro.dbt.alter_relation_add_remove_columns": {"name": "alter_relation_add_remove_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.alter_relation_add_remove_columns", "macro_sql": "{% macro alter_relation_add_remove_columns(relation, add_columns = none, remove_columns = none) -%}\n {{ return(adapter.dispatch('alter_relation_add_remove_columns', 'dbt')(relation, add_columns, remove_columns)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__alter_relation_add_remove_columns"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.700109, "supported_languages": null}, "macro.dbt.default__alter_relation_add_remove_columns": {"name": "default__alter_relation_add_remove_columns", "resource_type": "macro", "package_name": "dbt", "path": "macros/adapters/columns.sql", "original_file_path": "macros/adapters/columns.sql", "unique_id": "macro.dbt.default__alter_relation_add_remove_columns", "macro_sql": "{% macro default__alter_relation_add_remove_columns(relation, add_columns, remove_columns) %}\n\n {% if add_columns is none %}\n {% set add_columns = [] %}\n {% endif %}\n {% if remove_columns is none %}\n {% set remove_columns = [] %}\n {% endif %}\n\n {% set sql -%}\n\n alter {{ relation.type }} {{ relation }}\n\n {% for column in add_columns %}\n add column {{ column.name }} {{ column.data_type }}{{ ',' if not loop.last }}\n {% endfor %}{{ ',' if add_columns and remove_columns }}\n\n {% for column in remove_columns %}\n drop column {{ column.name }}{{ ',' if not loop.last }}\n {% endfor %}\n\n {%- endset -%}\n\n {% do run_query(sql) %}\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.run_query"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.700779, "supported_languages": null}, "macro.dbt.resolve_model_name": {"name": "resolve_model_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/python_model/python.sql", "original_file_path": "macros/python_model/python.sql", "unique_id": "macro.dbt.resolve_model_name", "macro_sql": "{% macro resolve_model_name(input_model_name) %}\n {{ return(adapter.dispatch('resolve_model_name', 'dbt')(input_model_name)) }}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.default__resolve_model_name"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.701974, "supported_languages": null}, "macro.dbt.default__resolve_model_name": {"name": "default__resolve_model_name", "resource_type": "macro", "package_name": "dbt", "path": "macros/python_model/python.sql", "original_file_path": "macros/python_model/python.sql", "unique_id": "macro.dbt.default__resolve_model_name", "macro_sql": "\n\n{%- macro default__resolve_model_name(input_model_name) -%}\n {{ input_model_name | string | replace('\"', '\\\"') }}\n{%- endmacro -%}\n\n", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.702105, "supported_languages": null}, "macro.dbt.build_ref_function": {"name": "build_ref_function", "resource_type": "macro", "package_name": "dbt", "path": "macros/python_model/python.sql", "original_file_path": "macros/python_model/python.sql", "unique_id": "macro.dbt.build_ref_function", "macro_sql": "{% macro build_ref_function(model) %}\n\n {%- set ref_dict = {} -%}\n {%- for _ref in model.refs -%}\n {%- set resolved = ref(*_ref) -%}\n {%- do ref_dict.update({_ref | join('.'): resolve_model_name(resolved)}) -%}\n {%- endfor -%}\n\ndef ref(*args,dbt_load_df_function):\n refs = {{ ref_dict | tojson }}\n key = '.'.join(args)\n return dbt_load_df_function(refs[key])\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.resolve_model_name"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.702455, "supported_languages": null}, "macro.dbt.build_source_function": {"name": "build_source_function", "resource_type": "macro", "package_name": "dbt", "path": "macros/python_model/python.sql", "original_file_path": "macros/python_model/python.sql", "unique_id": "macro.dbt.build_source_function", "macro_sql": "{% macro build_source_function(model) %}\n\n {%- set source_dict = {} -%}\n {%- for _source in model.sources -%}\n {%- set resolved = source(*_source) -%}\n {%- do source_dict.update({_source | join('.'): resolve_model_name(resolved)}) -%}\n {%- endfor -%}\n\ndef source(*args, dbt_load_df_function):\n sources = {{ source_dict | tojson }}\n key = '.'.join(args)\n return dbt_load_df_function(sources[key])\n\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.resolve_model_name"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.702801, "supported_languages": null}, "macro.dbt.build_config_dict": {"name": "build_config_dict", "resource_type": "macro", "package_name": "dbt", "path": "macros/python_model/python.sql", "original_file_path": "macros/python_model/python.sql", "unique_id": "macro.dbt.build_config_dict", "macro_sql": "{% macro build_config_dict(model) %}\n {%- set config_dict = {} -%}\n {% set config_dbt_used = zip(model.config.config_keys_used, model.config.config_keys_defaults) | list %}\n {%- for key, default in config_dbt_used -%}\n {# weird type testing with enum, would be much easier to write this logic in Python! #}\n {%- if key == \"language\" -%}\n {%- set value = \"python\" -%}\n {%- endif -%}\n {%- set value = model.config.get(key, default) -%}\n {%- do config_dict.update({key: value}) -%}\n {%- endfor -%}\nconfig_dict = {{ config_dict }}\n{% endmacro %}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.7033, "supported_languages": null}, "macro.dbt.py_script_postfix": {"name": "py_script_postfix", "resource_type": "macro", "package_name": "dbt", "path": "macros/python_model/python.sql", "original_file_path": "macros/python_model/python.sql", "unique_id": "macro.dbt.py_script_postfix", "macro_sql": "{% macro py_script_postfix(model) %}\n# This part is user provided model code\n# you will need to copy the next section to run the code\n# COMMAND ----------\n# this part is dbt logic for get ref work, do not modify\n\n{{ build_ref_function(model ) }}\n{{ build_source_function(model ) }}\n{{ build_config_dict(model) }}\n\nclass config:\n def __init__(self, *args, **kwargs):\n pass\n\n @staticmethod\n def get(key, default=None):\n return config_dict.get(key, default)\n\nclass this:\n \"\"\"dbt.this() or dbt.this.identifier\"\"\"\n database = \"{{ this.database }}\"\n schema = \"{{ this.schema }}\"\n identifier = \"{{ this.identifier }}\"\n {% set this_relation_name = resolve_model_name(this) %}\n def __repr__(self):\n return '{{ this_relation_name }}'\n\n\nclass dbtObj:\n def __init__(self, load_df_function) -> None:\n self.source = lambda *args: source(*args, dbt_load_df_function=load_df_function)\n self.ref = lambda *args: ref(*args, dbt_load_df_function=load_df_function)\n self.config = config\n self.this = this()\n self.is_incremental = {{ is_incremental() }}\n\n# COMMAND ----------\n{{py_script_comment()}}\n{% endmacro %}", "depends_on": {"macros": ["macro.dbt.build_ref_function", "macro.dbt.build_source_function", "macro.dbt.build_config_dict", "macro.dbt.resolve_model_name", "macro.dbt.is_incremental", "macro.dbt.py_script_comment"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.703707, "supported_languages": null}, "macro.dbt.py_script_comment": {"name": "py_script_comment", "resource_type": "macro", "package_name": "dbt", "path": "macros/python_model/python.sql", "original_file_path": "macros/python_model/python.sql", "unique_id": "macro.dbt.py_script_comment", "macro_sql": "{%macro py_script_comment()%}\n{%endmacro%}", "depends_on": {"macros": []}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.7037702, "supported_languages": null}, "macro.dbt.test_unique": {"name": "test_unique", "resource_type": "macro", "package_name": "dbt", "path": "tests/generic/builtin.sql", "original_file_path": "tests/generic/builtin.sql", "unique_id": "macro.dbt.test_unique", "macro_sql": "{% test unique(model, column_name) %}\n {% set macro = adapter.dispatch('test_unique', 'dbt') %}\n {{ macro(model, column_name) }}\n{% endtest %}", "depends_on": {"macros": ["macro.dbt.default__test_unique"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.704208, "supported_languages": null}, "macro.dbt.test_not_null": {"name": "test_not_null", "resource_type": "macro", "package_name": "dbt", "path": "tests/generic/builtin.sql", "original_file_path": "tests/generic/builtin.sql", "unique_id": "macro.dbt.test_not_null", "macro_sql": "{% test not_null(model, column_name) %}\n {% set macro = adapter.dispatch('test_not_null', 'dbt') %}\n {{ macro(model, column_name) }}\n{% endtest %}", "depends_on": {"macros": ["macro.dbt.default__test_not_null"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.704405, "supported_languages": null}, "macro.dbt.test_accepted_values": {"name": "test_accepted_values", "resource_type": "macro", "package_name": "dbt", "path": "tests/generic/builtin.sql", "original_file_path": "tests/generic/builtin.sql", "unique_id": "macro.dbt.test_accepted_values", "macro_sql": "{% test accepted_values(model, column_name, values, quote=True) %}\n {% set macro = adapter.dispatch('test_accepted_values', 'dbt') %}\n {{ macro(model, column_name, values, quote) }}\n{% endtest %}", "depends_on": {"macros": ["macro.dbt.default__test_accepted_values"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.704649, "supported_languages": null}, "macro.dbt.test_relationships": {"name": "test_relationships", "resource_type": "macro", "package_name": "dbt", "path": "tests/generic/builtin.sql", "original_file_path": "tests/generic/builtin.sql", "unique_id": "macro.dbt.test_relationships", "macro_sql": "{% test relationships(model, column_name, to, field) %}\n {% set macro = adapter.dispatch('test_relationships', 'dbt') %}\n {{ macro(model, column_name, to, field) }}\n{% endtest %}", "depends_on": {"macros": ["macro.dbt.default__test_relationships"]}, "description": "", "meta": {}, "docs": {"show": true, "node_color": null}, "patch_path": null, "arguments": [], "created_at": 1680025829.704882, "supported_languages": null}}, "docs": {"doc.jaffle_shop.__overview__": {"name": "__overview__", "resource_type": "doc", "package_name": "jaffle_shop", "path": "overview.md", "original_file_path": "models/overview.md", "unique_id": "doc.jaffle_shop.__overview__", "block_contents": "## Data Documentation for Jaffle Shop\n\n`jaffle_shop` is a fictional ecommerce store.\n\nThis [dbt](https://www.getdbt.com/) project is for testing out code.\n\nThe source code can be found [here](https://github.com/clrcrl/jaffle_shop)."}, "doc.jaffle_shop.orders_status": {"name": "orders_status", "resource_type": "doc", "package_name": "jaffle_shop", "path": "docs.md", "original_file_path": "models/docs.md", "unique_id": "doc.jaffle_shop.orders_status", "block_contents": "Orders can be one of the following statuses:\n\n| status | description |\n|----------------|------------------------------------------------------------------------------------------------------------------------|\n| placed | The order has been placed but has not yet left the warehouse |\n| shipped | The order has ben shipped to the customer and is currently in transit |\n| completed | The order has been received by the customer |\n| return_pending | The customer has indicated that they would like to return the order, but it has not yet been received at the warehouse |\n| returned | The order has been returned by the customer and received at the warehouse |"}, "doc.dbt.__overview__": {"name": "__overview__", "resource_type": "doc", "package_name": "dbt", "path": "overview.md", "original_file_path": "docs/overview.md", "unique_id": "doc.dbt.__overview__", "block_contents": "### Welcome!\n\nWelcome to the auto-generated documentation for your dbt project!\n\n### Navigation\n\nYou can use the `Project` and `Database` navigation tabs on the left side of the window to explore the models\nin your project.\n\n#### Project Tab\nThe `Project` tab mirrors the directory structure of your dbt project. In this tab, you can see all of the\nmodels defined in your dbt project, as well as models imported from dbt packages.\n\n#### Database Tab\nThe `Database` tab also exposes your models, but in a format that looks more like a database explorer. This view\nshows relations (tables and views) grouped into database schemas. Note that ephemeral models are _not_ shown\nin this interface, as they do not exist in the database.\n\n### Graph Exploration\nYou can click the blue icon on the bottom-right corner of the page to view the lineage graph of your models.\n\nOn model pages, you'll see the immediate parents and children of the model you're exploring. By clicking the `Expand`\nbutton at the top-right of this lineage pane, you'll be able to see all of the models that are used to build,\nor are built from, the model you're exploring.\n\nOnce expanded, you'll be able to use the `--select` and `--exclude` model selection syntax to filter the\nmodels in the graph. For more information on model selection, check out the [dbt docs](https://docs.getdbt.com/docs/model-selection-syntax).\n\nNote that you can also right-click on models to interactively filter and explore the graph.\n\n---\n\n### More information\n\n- [What is dbt](https://docs.getdbt.com/docs/introduction)?\n- Read the [dbt viewpoint](https://docs.getdbt.com/docs/viewpoint)\n- [Installation](https://docs.getdbt.com/docs/installation)\n- Join the [dbt Community](https://www.getdbt.com/community/) for questions and discussion"}}, "exposures": {}, "metrics": {}, "selectors": {}, "disabled": {}, "parent_map": {"model.jaffle_shop.customers": ["model.jaffle_shop.stg_customers", "model.jaffle_shop.stg_orders", "model.jaffle_shop.stg_payments"], "model.jaffle_shop.orders": ["model.jaffle_shop.stg_orders", "model.jaffle_shop.stg_payments"], "model.jaffle_shop.stg_customers": ["seed.jaffle_shop.raw_customers"], "model.jaffle_shop.stg_payments": ["seed.jaffle_shop.raw_payments"], "model.jaffle_shop.stg_orders": ["seed.jaffle_shop.raw_orders"], "seed.jaffle_shop.raw_customers": [], "seed.jaffle_shop.raw_orders": [], "seed.jaffle_shop.raw_payments": [], "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1": ["model.jaffle_shop.customers"], "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d": ["model.jaffle_shop.customers"], "test.jaffle_shop.unique_orders_order_id.fed79b3a6e": ["model.jaffle_shop.orders"], "test.jaffle_shop.not_null_orders_order_id.cf6c17daed": ["model.jaffle_shop.orders"], "test.jaffle_shop.not_null_orders_customer_id.c5f02694af": ["model.jaffle_shop.orders"], "test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2": ["model.jaffle_shop.customers", "model.jaffle_shop.orders"], "test.jaffle_shop.accepted_values_orders_status__placed__shipped__completed__return_pending__returned.be6b5b5ec3": ["model.jaffle_shop.orders"], "test.jaffle_shop.not_null_orders_amount.106140f9fd": ["model.jaffle_shop.orders"], "test.jaffle_shop.not_null_orders_credit_card_amount.d3ca593b59": ["model.jaffle_shop.orders"], "test.jaffle_shop.not_null_orders_coupon_amount.ab90c90625": ["model.jaffle_shop.orders"], "test.jaffle_shop.not_null_orders_bank_transfer_amount.7743500c49": ["model.jaffle_shop.orders"], "test.jaffle_shop.not_null_orders_gift_card_amount.413a0d2d7a": ["model.jaffle_shop.orders"], "test.jaffle_shop.unique_stg_customers_customer_id.c7614daada": ["model.jaffle_shop.stg_customers"], "test.jaffle_shop.not_null_stg_customers_customer_id.e2cfb1f9aa": ["model.jaffle_shop.stg_customers"], "test.jaffle_shop.unique_stg_orders_order_id.e3b841c71a": ["model.jaffle_shop.stg_orders"], "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64": ["model.jaffle_shop.stg_orders"], "test.jaffle_shop.accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned.080fb20aad": ["model.jaffle_shop.stg_orders"], "test.jaffle_shop.unique_stg_payments_payment_id.3744510712": ["model.jaffle_shop.stg_payments"], "test.jaffle_shop.not_null_stg_payments_payment_id.c19cc50075": ["model.jaffle_shop.stg_payments"], "test.jaffle_shop.accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card.3c3820f278": ["model.jaffle_shop.stg_payments"]}, "child_map": {"model.jaffle_shop.customers": ["test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d", "test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2", "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"], "model.jaffle_shop.orders": ["test.jaffle_shop.accepted_values_orders_status__placed__shipped__completed__return_pending__returned.be6b5b5ec3", "test.jaffle_shop.not_null_orders_amount.106140f9fd", "test.jaffle_shop.not_null_orders_bank_transfer_amount.7743500c49", "test.jaffle_shop.not_null_orders_coupon_amount.ab90c90625", "test.jaffle_shop.not_null_orders_credit_card_amount.d3ca593b59", "test.jaffle_shop.not_null_orders_customer_id.c5f02694af", "test.jaffle_shop.not_null_orders_gift_card_amount.413a0d2d7a", "test.jaffle_shop.not_null_orders_order_id.cf6c17daed", "test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2", "test.jaffle_shop.unique_orders_order_id.fed79b3a6e"], "model.jaffle_shop.stg_customers": ["model.jaffle_shop.customers", "test.jaffle_shop.not_null_stg_customers_customer_id.e2cfb1f9aa", "test.jaffle_shop.unique_stg_customers_customer_id.c7614daada"], "model.jaffle_shop.stg_payments": ["model.jaffle_shop.customers", "model.jaffle_shop.orders", "test.jaffle_shop.accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card.3c3820f278", "test.jaffle_shop.not_null_stg_payments_payment_id.c19cc50075", "test.jaffle_shop.unique_stg_payments_payment_id.3744510712"], "model.jaffle_shop.stg_orders": ["model.jaffle_shop.customers", "model.jaffle_shop.orders", "test.jaffle_shop.accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned.080fb20aad", "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64", "test.jaffle_shop.unique_stg_orders_order_id.e3b841c71a"], "seed.jaffle_shop.raw_customers": ["model.jaffle_shop.stg_customers"], "seed.jaffle_shop.raw_orders": ["model.jaffle_shop.stg_orders"], "seed.jaffle_shop.raw_payments": ["model.jaffle_shop.stg_payments"], "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1": [], "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d": [], "test.jaffle_shop.unique_orders_order_id.fed79b3a6e": [], "test.jaffle_shop.not_null_orders_order_id.cf6c17daed": [], "test.jaffle_shop.not_null_orders_customer_id.c5f02694af": [], "test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2": [], "test.jaffle_shop.accepted_values_orders_status__placed__shipped__completed__return_pending__returned.be6b5b5ec3": [], "test.jaffle_shop.not_null_orders_amount.106140f9fd": [], "test.jaffle_shop.not_null_orders_credit_card_amount.d3ca593b59": [], "test.jaffle_shop.not_null_orders_coupon_amount.ab90c90625": [], "test.jaffle_shop.not_null_orders_bank_transfer_amount.7743500c49": [], "test.jaffle_shop.not_null_orders_gift_card_amount.413a0d2d7a": [], "test.jaffle_shop.unique_stg_customers_customer_id.c7614daada": [], "test.jaffle_shop.not_null_stg_customers_customer_id.e2cfb1f9aa": [], "test.jaffle_shop.unique_stg_orders_order_id.e3b841c71a": [], "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64": [], "test.jaffle_shop.accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned.080fb20aad": [], "test.jaffle_shop.unique_stg_payments_payment_id.3744510712": [], "test.jaffle_shop.not_null_stg_payments_payment_id.c19cc50075": [], "test.jaffle_shop.accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card.3c3820f278": []}}
\ No newline at end of file
diff --git a/tests/dbt_artifacts/target/run_results.json b/tests/dbt_artifacts/target/run_results.json
new file mode 100644
index 00000000..6886685e
--- /dev/null
+++ b/tests/dbt_artifacts/target/run_results.json
@@ -0,0 +1 @@
+{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v4.json", "dbt_version": "1.4.5", "generated_at": "2023-03-28T17:53:00.424212Z", "invocation_id": "289d7789-15b8-44be-a1f6-828f34858212", "env": {}}, "results": [{"status": "success", "timing": [{"name": "compile", "started_at": "2023-03-28T17:53:00.283978Z", "completed_at": "2023-03-28T17:53:00.285641Z"}, {"name": "execute", "started_at": "2023-03-28T17:53:00.285856Z", "completed_at": "2023-03-28T17:53:00.341543Z"}], "thread_id": "Thread-1 (worker)", "execution_time": 0.059603214263916016, "adapter_response": {"_message": "OK"}, "message": "OK", "failures": null, "unique_id": "model.jaffle_shop.stg_customers"}, {"status": "success", "timing": [{"name": "compile", "started_at": "2023-03-28T17:53:00.345114Z", "completed_at": "2023-03-28T17:53:00.346354Z"}, {"name": "execute", "started_at": "2023-03-28T17:53:00.346549Z", "completed_at": "2023-03-28T17:53:00.357085Z"}], "thread_id": "Thread-1 (worker)", "execution_time": 0.013532876968383789, "adapter_response": {"_message": "OK"}, "message": "OK", "failures": null, "unique_id": "model.jaffle_shop.stg_orders"}, {"status": "success", "timing": [{"name": "compile", "started_at": "2023-03-28T17:53:00.359417Z", "completed_at": "2023-03-28T17:53:00.360809Z"}, {"name": "execute", "started_at": "2023-03-28T17:53:00.361018Z", "completed_at": "2023-03-28T17:53:00.371547Z"}], "thread_id": "Thread-1 (worker)", "execution_time": 0.013769149780273438, "adapter_response": {"_message": "OK"}, "message": "OK", "failures": null, "unique_id": "model.jaffle_shop.stg_payments"}, {"status": "success", "timing": [{"name": "compile", "started_at": "2023-03-28T17:53:00.374177Z", "completed_at": "2023-03-28T17:53:00.375799Z"}, {"name": "execute", "started_at": "2023-03-28T17:53:00.376001Z", "completed_at": "2023-03-28T17:53:00.399869Z"}], "thread_id": "Thread-1 (worker)", "execution_time": 0.02851414680480957, "adapter_response": {"_message": "OK"}, "message": "OK", "failures": null, "unique_id": "model.jaffle_shop.customers"}, {"status": "success", "timing": [{"name": "compile", "started_at": "2023-03-28T17:53:00.403579Z", "completed_at": "2023-03-28T17:53:00.405505Z"}, {"name": "execute", "started_at": "2023-03-28T17:53:00.405703Z", "completed_at": "2023-03-28T17:53:00.417432Z"}], "thread_id": "Thread-1 (worker)", "execution_time": 0.01594710350036621, "adapter_response": {"_message": "OK"}, "message": "OK", "failures": null, "unique_id": "model.jaffle_shop.orders"}], "elapsed_time": 0.17573904991149902, "args": {"write_json": true, "use_colors": true, "printer_width": 80, "version_check": true, "partial_parse": true, "static_parser": true, "profiles_dir": "/Users/dan/.dbt", "send_anonymous_usage_stats": true, "quiet": false, "no_print": false, "cache_selected_only": false, "target": "dev", "which": "run", "rpc_method": "run", "indirect_selection": "eager"}}
\ No newline at end of file
diff --git a/tests/sqeleton/__init__.py b/tests/sqeleton/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/sqeleton/common.py b/tests/sqeleton/common.py
new file mode 100644
index 00000000..03625da7
--- /dev/null
+++ b/tests/sqeleton/common.py
@@ -0,0 +1,158 @@
+import hashlib
+import os
+import string
+import random
+from typing import Callable
+import unittest
+import logging
+import subprocess
+
+from parameterized import parameterized_class
+
+import data_diff.sqeleton
+from data_diff.sqeleton import databases as db
+from data_diff.sqeleton.abcs.mixins import AbstractMixin_NormalizeValue
+from data_diff.sqeleton.queries import table
+from data_diff.sqeleton.databases import Database
+from data_diff.sqeleton.query_utils import drop_table
+from tests.common import (
+ TEST_MYSQL_CONN_STRING,
+ TEST_POSTGRESQL_CONN_STRING,
+ TEST_SNOWFLAKE_CONN_STRING,
+ TEST_PRESTO_CONN_STRING,
+ TEST_BIGQUERY_CONN_STRING,
+ TEST_REDSHIFT_CONN_STRING,
+ TEST_ORACLE_CONN_STRING,
+ TEST_DATABRICKS_CONN_STRING,
+ TEST_TRINO_CONN_STRING,
+ TEST_CLICKHOUSE_CONN_STRING,
+ TEST_VERTICA_CONN_STRING,
+ TEST_DUCKDB_CONN_STRING,
+ N_THREADS,
+ TEST_ACROSS_ALL_DBS,
+)
+
+
+def get_git_revision_short_hash() -> str:
+ return subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode("ascii").strip()
+
+
+GIT_REVISION = get_git_revision_short_hash()
+
+level = logging.ERROR
+if os.environ.get("LOG_LEVEL", False):
+ level = getattr(logging, os.environ["LOG_LEVEL"].upper())
+
+logging.basicConfig(level=level)
+logging.getLogger("database").setLevel(level)
+
+try:
+ from .local_settings import *
+except ImportError:
+ pass # No local settings
+
+
+CONN_STRINGS = {
+ db.BigQuery: TEST_BIGQUERY_CONN_STRING,
+ db.MySQL: TEST_MYSQL_CONN_STRING,
+ db.PostgreSQL: TEST_POSTGRESQL_CONN_STRING,
+ db.Snowflake: TEST_SNOWFLAKE_CONN_STRING,
+ db.Redshift: TEST_REDSHIFT_CONN_STRING,
+ db.Oracle: TEST_ORACLE_CONN_STRING,
+ db.Presto: TEST_PRESTO_CONN_STRING,
+ db.Databricks: TEST_DATABRICKS_CONN_STRING,
+ db.Trino: TEST_TRINO_CONN_STRING,
+ db.Clickhouse: TEST_CLICKHOUSE_CONN_STRING,
+ db.Vertica: TEST_VERTICA_CONN_STRING,
+ db.DuckDB: TEST_DUCKDB_CONN_STRING,
+}
+
+_database_instances = {}
+
+
+def get_conn(cls: type, shared: bool = True) -> Database:
+ if shared:
+ if cls not in _database_instances:
+ _database_instances[cls] = get_conn(cls, shared=False)
+ return _database_instances[cls]
+
+ con = data_diff.sqeleton.connect.load_mixins(AbstractMixin_NormalizeValue)
+ return con(CONN_STRINGS[cls], N_THREADS)
+
+
+def _print_used_dbs():
+ used = {k.__name__ for k, v in CONN_STRINGS.items() if v is not None}
+ unused = {k.__name__ for k, v in CONN_STRINGS.items() if v is None}
+
+ print(f"Testing databases: {', '.join(used)}")
+ if unused:
+ logging.info(f"Connection not configured; skipping tests for: {', '.join(unused)}")
+ if TEST_ACROSS_ALL_DBS:
+ logging.info(
+ f"Full tests enabled (every db<->db). May take very long when many dbs are involved. ={TEST_ACROSS_ALL_DBS}"
+ )
+
+
+_print_used_dbs()
+CONN_STRINGS = {k: v for k, v in CONN_STRINGS.items() if v is not None}
+
+
+def random_table_suffix() -> str:
+ char_set = string.ascii_lowercase + string.digits
+ suffix = "_"
+ suffix += "".join(random.choice(char_set) for _ in range(5))
+ return suffix
+
+
+def str_to_checksum(str: str):
+ # hello world
+ # => 5eb63bbbe01eeed093cb22bb8f5acdc3
+ # => cb22bb8f5acdc3
+ # => 273350391345368515
+ m = hashlib.md5()
+ m.update(str.encode("utf-8")) # encode to binary
+ md5 = m.hexdigest()
+ # 0-indexed, unlike DBs which are 1-indexed here, so +1 in dbs
+ half_pos = db.MD5_HEXDIGITS - db.CHECKSUM_HEXDIGITS
+ return int(md5[half_pos:], 16)
+
+
+class DbTestCase(unittest.TestCase):
+ "Sets up a table for testing"
+ db_cls = None
+ table1_schema = None
+ shared_connection = True
+
+ def setUp(self):
+ assert self.db_cls, self.db_cls
+
+ self.connection = get_conn(self.db_cls, self.shared_connection)
+
+ table_suffix = random_table_suffix()
+ self.table1_name = f"src{table_suffix}"
+
+ self.table1_path = self.connection.parse_table_name(self.table1_name)
+
+ drop_table(self.connection, self.table1_path)
+
+ self.src_table = table(self.table1_path, schema=self.table1_schema)
+ if self.table1_schema:
+ self.connection.query(self.src_table.create())
+
+ return super().setUp()
+
+ def tearDown(self):
+ drop_table(self.connection, self.table1_path)
+
+
+def _parameterized_class_per_conn(test_databases):
+ test_databases = set(test_databases)
+ names = [(cls.__name__, cls) for cls in CONN_STRINGS if cls in test_databases]
+ return parameterized_class(("name", "db_cls"), names)
+
+
+def test_each_database_in_list(databases) -> Callable:
+ def _test_per_database(cls):
+ return _parameterized_class_per_conn(databases)(cls)
+
+ return _test_per_database
diff --git a/tests/sqeleton/test_database.py b/tests/sqeleton/test_database.py
new file mode 100644
index 00000000..21b24643
--- /dev/null
+++ b/tests/sqeleton/test_database.py
@@ -0,0 +1,162 @@
+import unittest
+from datetime import datetime
+from typing import Callable, List, Tuple
+
+import pytz
+
+from data_diff.sqeleton import connect
+from data_diff.sqeleton import databases as dbs
+from data_diff.sqeleton.queries import table, current_timestamp, NormalizeAsString
+from tests.common import TEST_MYSQL_CONN_STRING
+from tests.sqeleton.common import str_to_checksum, test_each_database_in_list, get_conn, random_table_suffix
+from data_diff.sqeleton.abcs.database_types import TimestampTZ
+
+TEST_DATABASES = {
+ dbs.MySQL,
+ dbs.PostgreSQL,
+ dbs.Oracle,
+ dbs.Redshift,
+ dbs.Snowflake,
+ dbs.DuckDB,
+ dbs.BigQuery,
+ dbs.Presto,
+ dbs.Trino,
+ dbs.Vertica,
+}
+
+test_each_database: Callable = test_each_database_in_list(TEST_DATABASES)
+
+
+class TestDatabase(unittest.TestCase):
+ def setUp(self):
+ self.mysql = connect(TEST_MYSQL_CONN_STRING)
+
+ def test_connect_to_db(self):
+ self.assertEqual(1, self.mysql.query("SELECT 1", int))
+
+
+class TestMD5(unittest.TestCase):
+ def test_md5_as_int(self):
+ class MD5Dialect(dbs.mysql.Dialect, dbs.mysql.Mixin_MD5):
+ pass
+
+ self.mysql = connect(TEST_MYSQL_CONN_STRING)
+ self.mysql.dialect = MD5Dialect()
+
+ str = "hello world"
+ query_fragment = self.mysql.dialect.md5_as_int("'{0}'".format(str))
+ query = f"SELECT {query_fragment}"
+
+ self.assertEqual(str_to_checksum(str), self.mysql.query(query, int))
+
+
+class TestConnect(unittest.TestCase):
+ def test_bad_uris(self):
+ self.assertRaises(ValueError, connect, "p")
+ self.assertRaises(ValueError, connect, "postgresql:///bla/foo")
+ self.assertRaises(ValueError, connect, "snowflake://user:pass@foo/bar/TEST1")
+ self.assertRaises(ValueError, connect, "snowflake://user:pass@foo/bar/TEST1?warehouse=ha&schema=dup")
+
+
+@test_each_database
+class TestSchema(unittest.TestCase):
+ def test_table_list(self):
+ name = "tbl_" + random_table_suffix()
+ db = get_conn(self.db_cls)
+ tbl = table(db.parse_table_name(name), schema={"id": int})
+ q = db.dialect.list_tables(db.default_schema, name)
+ assert not db.query(q)
+
+ db.query(tbl.create())
+ self.assertEqual(db.query(q, List[str]), [name])
+
+ db.query(tbl.drop())
+ assert not db.query(q)
+
+ def test_type_mapping(self):
+ name = "tbl_" + random_table_suffix()
+ db = get_conn(self.db_cls)
+ tbl = table(db.parse_table_name(name), schema={
+ "int": int,
+ "float": float,
+ "datetime": datetime,
+ "str": str,
+ "bool": bool,
+ })
+ q = db.dialect.list_tables(db.default_schema, name)
+ assert not db.query(q)
+
+ db.query(tbl.create())
+ self.assertEqual(db.query(q, List[str]), [name])
+
+ db.query(tbl.drop())
+ assert not db.query(q)
+
+@test_each_database
+class TestQueries(unittest.TestCase):
+ def test_current_timestamp(self):
+ db = get_conn(self.db_cls)
+ res = db.query(current_timestamp(), datetime)
+ assert isinstance(res, datetime), (res, type(res))
+
+ def test_correct_timezone(self):
+ name = "tbl_" + random_table_suffix()
+ db = get_conn(self.db_cls)
+ tbl = table(name, schema={
+ "id": int, "created_at": TimestampTZ(9), "updated_at": TimestampTZ(9)
+ })
+
+ db.query(tbl.create())
+
+ tz = pytz.timezone('Europe/Berlin')
+
+ now = datetime.now(tz)
+ if isinstance(db, dbs.Presto):
+ ms = now.microsecond // 1000 * 1000 # Presto max precision is 3
+ now = now.replace(microsecond = ms)
+
+ db.query(table(name).insert_row(1, now, now))
+ db.query(db.dialect.set_timezone_to_utc())
+
+ t = db.table(name).query_schema()
+ t.schema["created_at"] = t.schema["created_at"].replace(precision=t.schema["created_at"].precision)
+
+ tbl = table(name, schema=t.schema)
+
+ results = db.query(tbl.select(NormalizeAsString(tbl[c]) for c in ["created_at", "updated_at"]), List[Tuple])
+
+ created_at = results[0][1]
+ updated_at = results[0][1]
+
+ utc = now.astimezone(pytz.UTC)
+ expected = utc.__format__("%Y-%m-%d %H:%M:%S.%f")
+
+
+ self.assertEqual(created_at, expected)
+ self.assertEqual(updated_at, expected)
+
+ db.query(tbl.drop())
+
+@test_each_database
+class TestThreePartIds(unittest.TestCase):
+ def test_three_part_support(self):
+ if self.db_cls not in [dbs.PostgreSQL, dbs.Redshift, dbs.Snowflake, dbs.DuckDB]:
+ self.skipTest("Limited support for 3 part ids")
+
+ table_name = "tbl_" + random_table_suffix()
+ db = get_conn(self.db_cls)
+ db_res = db.query("SELECT CURRENT_DATABASE()")
+ schema_res = db.query("SELECT CURRENT_SCHEMA()")
+ db_name = db_res.rows[0][0]
+ schema_name = schema_res.rows[0][0]
+
+ table_one_part = table((table_name,), schema={"id": int})
+ table_two_part = table((schema_name, table_name), schema={"id": int})
+ table_three_part = table((db_name, schema_name, table_name), schema={"id": int})
+
+ for part in (table_one_part, table_two_part, table_three_part):
+ db.query(part.create())
+ d = db.query_table_schema(part.path)
+ assert len(d) == 1
+ db.query(part.drop())
+
diff --git a/tests/sqeleton/test_mixins.py b/tests/sqeleton/test_mixins.py
new file mode 100644
index 00000000..68ed80a4
--- /dev/null
+++ b/tests/sqeleton/test_mixins.py
@@ -0,0 +1,32 @@
+import unittest
+
+from data_diff.sqeleton import connect
+
+from data_diff.sqeleton.abcs import AbstractDialect, AbstractDatabase
+from data_diff.sqeleton.abcs.mixins import AbstractMixin_NormalizeValue, AbstractMixin_RandomSample, AbstractMixin_TimeTravel
+
+
+class TestMixins(unittest.TestCase):
+ def test_normalize(self):
+ # - Test sanity
+ ddb1 = connect("duckdb://:memory:")
+ assert not hasattr(ddb1.dialect, "normalize_boolean")
+
+ # - Test abstract mixins
+ class NewAbstractDialect(AbstractDialect, AbstractMixin_NormalizeValue, AbstractMixin_RandomSample):
+ pass
+
+ new_connect = connect.load_mixins(AbstractMixin_NormalizeValue, AbstractMixin_RandomSample)
+ ddb2: AbstractDatabase[NewAbstractDialect] = new_connect("duckdb://:memory:")
+ # Implementation may change; Just update the test
+ assert ddb2.dialect.normalize_boolean("bool", None) == "bool::INTEGER::VARCHAR"
+ assert ddb2.dialect.random_sample_n("x", 10)
+
+ # - Test immutability
+ ddb3 = connect("duckdb://:memory:")
+ assert not hasattr(ddb3.dialect, "normalize_boolean")
+
+ self.assertRaises(TypeError, connect.load_mixins, AbstractMixin_TimeTravel)
+
+ new_connect = connect.for_databases("bigquery", "snowflake").load_mixins(AbstractMixin_TimeTravel)
+ self.assertRaises(NotImplementedError, new_connect, "duckdb://:memory:")
diff --git a/tests/sqeleton/test_query.py b/tests/sqeleton/test_query.py
new file mode 100644
index 00000000..efc41c02
--- /dev/null
+++ b/tests/sqeleton/test_query.py
@@ -0,0 +1,320 @@
+from datetime import datetime
+from typing import List, Optional
+import unittest
+from data_diff.sqeleton.abcs import AbstractDatabase, AbstractDialect
+from data_diff.sqeleton.utils import CaseInsensitiveDict, CaseSensitiveDict
+
+from data_diff.sqeleton.queries import this, table, Compiler, outerjoin, cte, when, coalesce, CompileError
+from data_diff.sqeleton.queries.ast_classes import Random
+from data_diff.sqeleton import code, this, table
+
+
+def normalize_spaces(s: str):
+ return " ".join(s.split())
+
+
+class MockDialect(AbstractDialect):
+ name = "MockDialect"
+
+ PLACEHOLDER_TABLE = None
+ ROUNDS_ON_PREC_LOSS = False
+
+ def quote(self, s: str) -> str:
+ return s
+
+ def concat(self, l: List[str]) -> str:
+ s = ", ".join(l)
+ return f"concat({s})"
+
+ def to_comparable(self, s: str) -> str:
+ return s
+
+ def to_string(self, s: str) -> str:
+ return f"cast({s} as varchar)"
+
+ def is_distinct_from(self, a: str, b: str) -> str:
+ return f"{a} is distinct from {b}"
+
+ def random(self) -> str:
+ return "random()"
+
+ def current_timestamp(self) -> str:
+ return "now()"
+
+ def offset_limit(self, offset: Optional[int] = None, limit: Optional[int] = None):
+ x = offset and f"OFFSET {offset}", limit and f"LIMIT {limit}"
+ return " ".join(filter(None, x))
+
+ def explain_as_text(self, query: str) -> str:
+ return f"explain {query}"
+
+ def timestamp_value(self, t: datetime) -> str:
+ return f"timestamp '{t}'"
+
+ def set_timezone_to_utc(self) -> str:
+ return "set timezone 'UTC'"
+
+ def optimizer_hints(self, s: str):
+ return f"/*+ {s} */ "
+
+ def load_mixins(self):
+ raise NotImplementedError()
+
+ parse_type = NotImplemented
+
+
+class MockDatabase(AbstractDatabase):
+ dialect = MockDialect()
+
+ _query = NotImplemented
+ query_table_schema = NotImplemented
+ select_table_schema = NotImplemented
+ _process_table_schema = NotImplemented
+ parse_table_name = NotImplemented
+ close = NotImplemented
+ _normalize_table_path = NotImplemented
+ is_autocommit = NotImplemented
+
+
+class TestQuery(unittest.TestCase):
+ def setUp(self):
+ pass
+
+ def test_basic(self):
+ c = Compiler(MockDatabase())
+
+ t = table("point")
+ t2 = t.select(x=this.x + 1, y=t["y"] + this.x)
+ assert c.compile(t2) == "SELECT (x + 1) AS x, (y + x) AS y FROM point"
+
+ t = table("point").where(this.x == 1, this.y == 2)
+ assert c.compile(t) == "SELECT * FROM point WHERE (x = 1) AND (y = 2)"
+
+ t = table("person").where(this.name == "Albert")
+ self.assertEqual(c.compile(t), "SELECT * FROM person WHERE (name = 'Albert')")
+
+ def test_outerjoin(self):
+ c = Compiler(MockDatabase())
+
+ a = table("a")
+ b = table("b")
+ keys = ["x", "y"]
+ cols = ["u", "v"]
+
+ j = outerjoin(a, b).on(a[k] == b[k] for k in keys)
+
+ self.assertEqual(
+ c.compile(j), "SELECT * FROM a tmp1 FULL OUTER JOIN b tmp2 ON (tmp1.x = tmp2.x) AND (tmp1.y = tmp2.y)"
+ )
+
+ def test_schema(self):
+ c = Compiler(MockDatabase())
+ schema = dict(id="int", comment="varchar")
+
+ # test table
+ t = table("a", schema=CaseInsensitiveDict(schema))
+ q = t.select(this.Id, t["COMMENT"])
+ assert c.compile(q) == "SELECT id, comment FROM a"
+
+ t = table("a", schema=CaseSensitiveDict(schema))
+ self.assertRaises(KeyError, t.__getitem__, "Id")
+ self.assertRaises(KeyError, t.select, this.Id)
+
+ # test select
+ q = t.select(this.id)
+ self.assertRaises(KeyError, q.__getitem__, "comment")
+
+ # test join
+ s = CaseInsensitiveDict({"x": int, "y": int})
+ a = table("a", schema=s)
+ b = table("b", schema=s)
+ keys = ["x", "y"]
+ j = outerjoin(a, b).on(a[k] == b[k] for k in keys).select(a["x"], b["y"], xsum=a["x"] + b["x"])
+ j["x"], j["y"], j["xsum"]
+ self.assertRaises(KeyError, j.__getitem__, "ysum")
+
+ def test_commutable_select(self):
+ # c = Compiler(MockDatabase())
+
+ t = table("a")
+ q1 = t.select("a").where("b")
+ q2 = t.where("b").select("a")
+ assert q1 == q2, (q1, q2)
+
+ def test_cte(self):
+ c = Compiler(MockDatabase())
+
+ t = table("a")
+
+ # single cte
+ t2 = cte(t.select(this.x))
+ t3 = t2.select(this.x)
+
+ expected = "WITH tmp1 AS (SELECT x FROM a) SELECT x FROM tmp1"
+ assert normalize_spaces(c.compile(t3)) == expected
+
+ # nested cte
+ c = Compiler(MockDatabase())
+ t4 = cte(t3).select(this.x)
+
+ expected = "WITH tmp1 AS (SELECT x FROM a), tmp2 AS (SELECT x FROM tmp1) SELECT x FROM tmp2"
+ assert normalize_spaces(c.compile(t4)) == expected
+
+ # parameterized cte
+ c = Compiler(MockDatabase())
+ t2 = cte(t.select(this.x), params=["y"])
+ t3 = t2.select(this.y)
+
+ expected = "WITH tmp1(y) AS (SELECT x FROM a) SELECT y FROM tmp1"
+ assert normalize_spaces(c.compile(t3)) == expected
+
+ def test_funcs(self):
+ c = Compiler(MockDatabase())
+ t = table("a")
+
+ q = c.compile(t.order_by(Random()).limit(10))
+ self.assertEqual(q, "SELECT * FROM a ORDER BY random() LIMIT 10")
+
+ q = c.compile(t.select(coalesce(this.a, this.b)))
+ self.assertEqual(q, "SELECT COALESCE(a, b) FROM a")
+
+ def test_select_distinct(self):
+ c = Compiler(MockDatabase())
+ t = table("a")
+
+ q = c.compile(t.select(this.b, distinct=True))
+ assert q == "SELECT DISTINCT b FROM a"
+
+ # selects merge
+ q = c.compile(t.where(this.b > 10).select(this.b, distinct=True))
+ self.assertEqual(q, "SELECT DISTINCT b FROM a WHERE (b > 10)")
+
+ # selects stay apart
+ q = c.compile(t.limit(10).select(this.b, distinct=True))
+ self.assertEqual(q, "SELECT DISTINCT b FROM (SELECT * FROM a LIMIT 10) tmp1")
+
+ q = c.compile(t.select(this.b, distinct=True).select(distinct=False))
+ self.assertEqual(q, "SELECT * FROM (SELECT DISTINCT b FROM a) tmp2")
+
+ def test_select_with_optimizer_hints(self):
+ c = Compiler(MockDatabase())
+ t = table("a")
+
+ q = c.compile(t.select(this.b, optimizer_hints="PARALLEL(a 16)"))
+ assert q == "SELECT /*+ PARALLEL(a 16) */ b FROM a"
+
+ q = c.compile(t.where(this.b > 10).select(this.b, optimizer_hints="PARALLEL(a 16)"))
+ self.assertEqual(q, "SELECT /*+ PARALLEL(a 16) */ b FROM a WHERE (b > 10)")
+
+ q = c.compile(t.limit(10).select(this.b, optimizer_hints="PARALLEL(a 16)"))
+ self.assertEqual(q, "SELECT /*+ PARALLEL(a 16) */ b FROM (SELECT * FROM a LIMIT 10) tmp1")
+
+ q = c.compile(t.select(this.a).group_by(this.b).agg(this.c).select(optimizer_hints="PARALLEL(a 16)"))
+ self.assertEqual(
+ q, "SELECT /*+ PARALLEL(a 16) */ * FROM (SELECT b, c FROM (SELECT a FROM a) tmp2 GROUP BY 1) tmp3"
+ )
+
+ def test_table_ops(self):
+ c = Compiler(MockDatabase())
+ a = table("a").select(this.x)
+ b = table("b").select(this.y)
+
+ q = c.compile(a.union(b))
+ assert q == "SELECT x FROM a UNION SELECT y FROM b"
+
+ q = c.compile(a.union_all(b))
+ assert q == "SELECT x FROM a UNION ALL SELECT y FROM b"
+
+ q = c.compile(a.minus(b))
+ assert q == "SELECT x FROM a EXCEPT SELECT y FROM b"
+
+ q = c.compile(a.intersect(b))
+ assert q == "SELECT x FROM a INTERSECT SELECT y FROM b"
+
+ def test_ops(self):
+ c = Compiler(MockDatabase())
+ t = table("a")
+
+ q = c.compile(t.select(this.b + this.c))
+ self.assertEqual(q, "SELECT (b + c) FROM a")
+
+ q = c.compile(t.select(this.b.like(this.c)))
+ self.assertEqual(q, "SELECT (b LIKE c) FROM a")
+
+ q = c.compile(t.select(-this.b.sum()))
+ self.assertEqual(q, "SELECT (-SUM(b)) FROM a")
+
+ def test_group_by(self):
+ c = Compiler(MockDatabase())
+ t = table("a")
+
+ q = c.compile(t.group_by(this.b).agg(this.c))
+ self.assertEqual(q, "SELECT b, c FROM a GROUP BY 1")
+
+ q = c.compile(t.where(this.b > 1).group_by(this.b).agg(this.c))
+ self.assertEqual(q, "SELECT b, c FROM a WHERE (b > 1) GROUP BY 1")
+
+ self.assertRaises(CompileError, c.compile, t.select(this.b).group_by(this.b))
+
+ q = c.compile(t.select(this.b).group_by(this.b).agg())
+ self.assertEqual(q, "SELECT b FROM (SELECT b FROM a) tmp1 GROUP BY 1")
+
+ q = c.compile(t.group_by(this.b, this.c).agg(this.d, this.e))
+ self.assertEqual(q, "SELECT b, c, d, e FROM a GROUP BY 1, 2")
+
+ # Having
+ q = c.compile(t.group_by(this.b).agg(this.c).having(this.b > 1))
+ self.assertEqual(q, "SELECT b, c FROM a GROUP BY 1 HAVING (b > 1)")
+
+ q = c.compile(t.group_by(this.b).having(this.b > 1).agg(this.c))
+ self.assertEqual(q, "SELECT b, c FROM a GROUP BY 1 HAVING (b > 1)")
+
+ q = c.compile(t.select(this.b).group_by(this.b).agg().having(this.b > 1))
+ self.assertEqual(q, "SELECT b FROM (SELECT b FROM a) tmp2 GROUP BY 1 HAVING (b > 1)")
+
+ # Having sum
+ q = c.compile(t.group_by(this.b).agg(this.c, this.d).having(this.b.sum() > 1))
+ self.assertEqual(q, "SELECT b, c, d FROM a GROUP BY 1 HAVING (SUM(b) > 1)")
+
+ # Select interaction
+ q = c.compile(t.select(this.a).group_by(this.b).agg(this.c).select(this.c + 1))
+ self.assertEqual(q, "SELECT (c + 1) FROM (SELECT b, c FROM (SELECT a FROM a) tmp3 GROUP BY 1) tmp4")
+
+ def test_case_when(self):
+ c = Compiler(MockDatabase())
+ t = table("a")
+
+ q = c.compile(t.select(when(this.b).then(this.c)))
+ self.assertEqual(q, "SELECT CASE WHEN b THEN c END FROM a")
+
+ q = c.compile(t.select(when(this.b).then(this.c).else_(this.d)))
+ self.assertEqual(q, "SELECT CASE WHEN b THEN c ELSE d END FROM a")
+
+ q = c.compile(
+ t.select(
+ when(this.type == "text")
+ .then(this.text)
+ .when(this.type == "number")
+ .then(this.number)
+ .else_("unknown type")
+ )
+ )
+ self.assertEqual(
+ q,
+ "SELECT CASE WHEN (type = 'text') THEN text WHEN (type = 'number') THEN number ELSE 'unknown type' END FROM a",
+ )
+
+ def test_code(self):
+ c = Compiler(MockDatabase())
+ t = table("a")
+
+ q = c.compile(t.select(this.b, code("")).where(code("")))
+ self.assertEqual(q, "SELECT b, FROM a WHERE ")
+
+ def tablesample(t, size):
+ return code("{t} TABLESAMPLE BERNOULLI ({size})", t=t, size=size)
+
+ nonzero = table("points").where(this.x > 0, this.y > 0)
+
+ q = c.compile(tablesample(nonzero, 10))
+ self.assertEqual(q, "SELECT * FROM points WHERE (x > 0) AND (y > 0) TABLESAMPLE BERNOULLI (10)")
diff --git a/tests/sqeleton/test_sql.py b/tests/sqeleton/test_sql.py
new file mode 100644
index 00000000..d8e07046
--- /dev/null
+++ b/tests/sqeleton/test_sql.py
@@ -0,0 +1,106 @@
+import unittest
+
+from tests.common import TEST_MYSQL_CONN_STRING
+
+from data_diff.sqeleton import connect
+from data_diff.sqeleton.queries import Compiler, Count, Explain, Select, table, In, BinOp, Code
+
+
+class TestSQL(unittest.TestCase):
+ def setUp(self):
+ self.mysql = connect(TEST_MYSQL_CONN_STRING)
+ self.compiler = Compiler(self.mysql)
+
+ def test_compile_string(self):
+ self.assertEqual("SELECT 1", self.compiler.compile(Code("SELECT 1")))
+
+ def test_compile_int(self):
+ self.assertEqual("1", self.compiler.compile(1))
+
+ def test_compile_table_name(self):
+ self.assertEqual(
+ "`marine_mammals`.`walrus`", self.compiler.replace(root=False).compile(table("marine_mammals", "walrus"))
+ )
+
+ def test_compile_select(self):
+ expected_sql = "SELECT name FROM `marine_mammals`.`walrus`"
+ self.assertEqual(
+ expected_sql,
+ self.compiler.compile(
+ Select(
+ table("marine_mammals", "walrus"),
+ [Code("name")],
+ )
+ ),
+ )
+
+ # def test_enum(self):
+ # expected_sql = "(SELECT *, (row_number() over (ORDER BY id)) as idx FROM `walrus` ORDER BY id) tmp"
+ # self.assertEqual(
+ # expected_sql,
+ # self.compiler.compile(
+ # Enum(
+ # ("walrus",),
+ # "id",
+ # )
+ # ),
+ # )
+
+ # def test_checksum(self):
+ # expected_sql = "SELECT name, sum(cast(conv(substring(md5(concat(cast(id as char), cast(timestamp as char))), 18), 16, 10) as unsigned)) FROM `marine_mammals`.`walrus`"
+ # self.assertEqual(
+ # expected_sql,
+ # self.compiler.compile(
+ # Select(
+ # ["name", Checksum(["id", "timestamp"])],
+ # TableName(("marine_mammals", "walrus")),
+ # )
+ # ),
+ # )
+
+ def test_compare(self):
+ expected_sql = "SELECT name FROM `marine_mammals`.`walrus` WHERE (id <= 1000) AND (id > 1)"
+ self.assertEqual(
+ expected_sql,
+ self.compiler.compile(
+ Select(
+ table("marine_mammals", "walrus"),
+ [Code("name")],
+ [BinOp("<=", [Code("id"), Code("1000")]), BinOp(">", [Code("id"), Code("1")])],
+ )
+ ),
+ )
+
+ def test_in(self):
+ expected_sql = "SELECT name FROM `marine_mammals`.`walrus` WHERE (id IN (1, 2, 3))"
+ self.assertEqual(
+ expected_sql,
+ self.compiler.compile(
+ Select(table("marine_mammals", "walrus"), [Code("name")], [In(Code("id"), [1, 2, 3])])
+ ),
+ )
+
+ def test_count(self):
+ expected_sql = "SELECT count(*) FROM `marine_mammals`.`walrus` WHERE (id IN (1, 2, 3))"
+ self.assertEqual(
+ expected_sql,
+ self.compiler.compile(Select(table("marine_mammals", "walrus"), [Count()], [In(Code("id"), [1, 2, 3])])),
+ )
+
+ def test_count_with_column(self):
+ expected_sql = "SELECT count(id) FROM `marine_mammals`.`walrus` WHERE (id IN (1, 2, 3))"
+ self.assertEqual(
+ expected_sql,
+ self.compiler.compile(
+ Select(table("marine_mammals", "walrus"), [Count(Code("id"))], [In(Code("id"), [1, 2, 3])])
+ ),
+ )
+
+ def test_explain(self):
+ expected_sql = "EXPLAIN FORMAT=TREE SELECT count(id) FROM `marine_mammals`.`walrus` WHERE (id IN (1, 2, 3))"
+ self.assertEqual(
+ expected_sql,
+ self.compiler.compile(
+ Explain(Select(table("marine_mammals", "walrus"), [Count(Code("id"))], [In(Code("id"), [1, 2, 3])]))
+ ),
+ )
diff --git a/tests/sqeleton/test_utils.py b/tests/sqeleton/test_utils.py
new file mode 100644
index 00000000..25ec9c39
--- /dev/null
+++ b/tests/sqeleton/test_utils.py
@@ -0,0 +1,104 @@
+import unittest
+
+from data_diff.sqeleton.utils import remove_passwords_in_dict, match_regexps, match_like, number_to_human, WeakCache
+
+
+class TestUtils(unittest.TestCase):
+ def test_remove_passwords_in_dict(self):
+ # Test replacing password value
+ d = {"password": "mypassword"}
+ remove_passwords_in_dict(d)
+ assert d["password"] == "***"
+
+ # Test replacing password in database URL
+ d = {"database_url": "mysql://user:mypassword@localhost/db"}
+ remove_passwords_in_dict(d, "$$$$")
+ assert d["database_url"] == "mysql://user:$$$$@localhost/db"
+
+ # Test replacing password in nested dictionary
+ d = {"info": {"password": "mypassword"}}
+ remove_passwords_in_dict(d, "%%")
+ assert d["info"]["password"] == "%%"
+
+ def test_match_regexps(self):
+ def only_results(x):
+ return [v for k, v in x]
+
+ # Test with no matches
+ regexps = {"a*": 1, "b*": 2}
+ s = "c"
+ assert only_results(match_regexps(regexps, s)) == []
+
+ # Test with one match
+ regexps = {"a*": 1, "b*": 2}
+ s = "b"
+ assert only_results(match_regexps(regexps, s)) == [2]
+
+ # Test with multiple matches
+ regexps = {"abc": 1, "ab*c": 2, "c*": 3}
+ s = "abc"
+ assert only_results(match_regexps(regexps, s)) == [1, 2]
+
+ # Test with regexp that doesn't match the end of the string
+ regexps = {"a*b": 1}
+ s = "acb"
+ assert only_results(match_regexps(regexps, s)) == []
+
+ def test_match_like(self):
+ strs = ["abc", "abcd", "ab", "bcd", "def"]
+
+ # Test exact match
+ pattern = "abc"
+ result = list(match_like(pattern, strs))
+ assert result == ["abc"]
+
+ # Test % match
+ pattern = "a%"
+ result = list(match_like(pattern, strs))
+ self.assertEqual(result, ["abc", "abcd", "ab"])
+
+ # Test ? match
+ pattern = "a?c"
+ result = list(match_like(pattern, strs))
+ self.assertEqual(result, ["abc"])
+
+ def test_number_to_human(self):
+ # Test basic conversion
+ assert number_to_human(1000) == "1k"
+ assert number_to_human(1000000) == "1m"
+ assert number_to_human(1000000000) == "1b"
+
+ # Test decimal values
+ assert number_to_human(1234) == "1k"
+ assert number_to_human(12345) == "12k"
+ assert number_to_human(123456) == "123k"
+ assert number_to_human(1234567) == "1m"
+ assert number_to_human(12345678) == "12m"
+ assert number_to_human(123456789) == "123m"
+ assert number_to_human(1234567890) == "1b"
+
+ # Test negative values
+ assert number_to_human(-1000) == "-1k"
+ assert number_to_human(-1000000) == "-1m"
+ assert number_to_human(-1000000000) == "-1b"
+
+ def test_weak_cache(self):
+ # Create cache
+ cache = WeakCache()
+
+ # Test adding and retrieving basic value
+ o = {1, 2}
+ cache.add("key", o)
+ assert cache.get("key") is o
+
+ # Test adding and retrieving dict value
+ cache.add({"key": "value"}, o)
+ assert cache.get({"key": "value"}) is o
+
+ # Test deleting value when reference is lost
+ del o
+ try:
+ cache.get({"key": "value"})
+ assert False, "KeyError should have been raised"
+ except KeyError:
+ pass
diff --git a/tests/test_api.py b/tests/test_api.py
index a48a8899..88af2dbf 100644
--- a/tests/test_api.py
+++ b/tests/test_api.py
@@ -2,7 +2,7 @@
from data_diff import diff_tables, connect_to_table, Algorithm
from data_diff.databases import MySQL
-from sqeleton.queries import table, commit
+from data_diff.sqeleton.queries import table, commit
from .common import TEST_MYSQL_CONN_STRING, get_conn, random_table_suffix, DiffTestCase
diff --git a/tests/test_cli.py b/tests/test_cli.py
index 024bc15c..898087d5 100644
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -3,7 +3,7 @@
import sys
from datetime import datetime, timedelta
-from sqeleton.queries import commit, current_timestamp
+from data_diff.sqeleton.queries import commit, current_timestamp
from .common import DiffTestCase, CONN_STRINGS
from .test_diff_tables import test_each_database
diff --git a/tests/test_database_types.py b/tests/test_database_types.py
index 70ecebad..fdf8784d 100644
--- a/tests/test_database_types.py
+++ b/tests/test_database_types.py
@@ -13,9 +13,9 @@
from parameterized import parameterized
-from sqeleton.utils import number_to_human
-from sqeleton.queries import table, commit, this, Code
-from sqeleton.queries.api import insert_rows_in_batches
+from data_diff.sqeleton.utils import number_to_human
+from data_diff.sqeleton.queries import table, commit, this, Code
+from data_diff.sqeleton.queries.api import insert_rows_in_batches
from data_diff import databases as db
from data_diff.query_utils import drop_table
@@ -74,6 +74,10 @@ def init_conns():
"boolean": [
"boolean",
],
+ "json": [
+ "json",
+ "jsonb"
+ ]
},
db.MySQL: {
# https://dev.mysql.com/doc/refman/8.0/en/integer-types.html
@@ -199,6 +203,9 @@ def init_conns():
"boolean": [
"boolean",
],
+ "json": [
+ "super",
+ ]
},
db.Oracle: {
"int": [
@@ -469,12 +476,28 @@ def __iter__(self):
return (uuid.uuid1(i) for i in range(self.max))
+class JsonFaker:
+ MANUAL_FAKES = [
+ '{"keyText": "text", "keyInt": 3, "keyFloat": 5.4445, "keyBoolean": true}',
+ ]
+
+ def __init__(self, max):
+ self.max = max
+
+ def __iter__(self):
+ return iter(self.MANUAL_FAKES[: self.max])
+
+ def __len__(self):
+ return min(self.max, len(self.MANUAL_FAKES))
+
+
TYPE_SAMPLES = {
"int": IntFaker(N_SAMPLES),
"datetime": DateTimeFaker(N_SAMPLES),
"float": FloatFaker(N_SAMPLES),
"uuid": UUID_Faker(N_SAMPLES),
"boolean": BooleanFaker(N_SAMPLES),
+ "json": JsonFaker(N_SAMPLES)
}
@@ -503,16 +526,17 @@ def get_test_db_pairs():
for source_db, source_type_categories, target_db, target_type_categories in get_test_db_pairs():
for type_category, source_types in source_type_categories.items(): # int, datetime, ..
for source_type in source_types:
- for target_type in target_type_categories[type_category]:
- type_pairs.append(
- (
- source_db,
- target_db,
- source_type,
- target_type,
- type_category,
+ if type_category in target_type_categories: # only cross-compatible types
+ for target_type in target_type_categories[type_category]:
+ type_pairs.append(
+ (
+ source_db,
+ target_db,
+ source_type,
+ target_type,
+ type_category,
+ )
)
- )
def sanitize(name):
@@ -546,7 +570,7 @@ def expand_params(testcase_func, param_num, param):
return name
-def _insert_to_table(conn, table_path, values, type):
+def _insert_to_table(conn, table_path, values, coltype):
tbl = table(table_path)
current_n_rows = conn.query(tbl.count(), int)
@@ -555,31 +579,41 @@ def _insert_to_table(conn, table_path, values, type):
return
elif current_n_rows > 0:
conn.query(drop_table(table_name))
- _create_table_with_indexes(conn, table_path, type)
+ _create_table_with_indexes(conn, table_path, coltype)
# if BENCHMARK and N_SAMPLES > 10_000:
# description = f"{conn.name}: {table}"
# values = rich.progress.track(values, total=N_SAMPLES, description=description)
- if type == "boolean":
+ if coltype == "boolean":
values = [(i, bool(sample)) for i, sample in values]
- elif re.search(r"(time zone|tz)", type):
+ elif re.search(r"(time zone|tz)", coltype):
values = [(i, sample.replace(tzinfo=timezone.utc)) for i, sample in values]
if isinstance(conn, db.Clickhouse):
- if type.startswith("DateTime64"):
+ if coltype.startswith("DateTime64"):
values = [(i, f"{sample.replace(tzinfo=None)}") for i, sample in values]
- elif type == "DateTime":
+ elif coltype == "DateTime":
# Clickhouse's DateTime does not allow to store micro/milli/nano seconds
values = [(i, str(sample)[:19]) for i, sample in values]
- elif type.startswith("Decimal("):
- precision = int(type[8:].rstrip(")").split(",")[1])
+ elif coltype.startswith("Decimal("):
+ precision = int(coltype[8:].rstrip(")").split(",")[1])
values = [(i, round(sample, precision)) for i, sample in values]
- elif isinstance(conn, db.BigQuery) and type == "datetime":
+ elif isinstance(conn, db.BigQuery) and coltype == "datetime":
values = [(i, Code(f"cast(timestamp '{sample}' as datetime)")) for i, sample in values]
+ elif isinstance(conn, db.Redshift) and coltype in ("json", "jsonb"):
+ values = [(i, Code(f"JSON_PARSE({sample})")) for i, sample in values]
+ elif isinstance(conn, db.PostgreSQL) and coltype in ("json", "jsonb"):
+ values = [(i, Code(
+ "'{}'".format(
+ (json.dumps(sample) if isinstance(sample, (dict, list)) else sample)
+ .replace('\'', '\'\'')
+ )
+ )) for i, sample in values]
+
insert_rows_in_batches(conn, tbl, values, columns=["id", "col"])
conn.query(commit)
@@ -601,9 +635,10 @@ def _create_table_with_indexes(conn, table_path, type_):
else:
conn.query(tbl.create())
- if conn.dialect.SUPPORTS_INDEXES:
- (index_id,) = table_path
+ (index_id,) = table_path
+ if conn.dialect.SUPPORTS_INDEXES and type_ not in ('json', 'jsonb', 'array', 'struct'):
conn.query(f"CREATE INDEX xa_{index_id} ON {table_name} ({quote('id')}, {quote('col')})")
+ if conn.dialect.SUPPORTS_INDEXES:
conn.query(f"CREATE INDEX xb_{index_id} ON {table_name} ({quote('id')})")
conn.query(commit)
@@ -698,9 +733,11 @@ def test_types(self, source_db, target_db, source_type, target_type, type_catego
checksum_duration = time.monotonic() - start
expected = []
self.assertEqual(expected, diff)
- self.assertEqual(
- 0, differ.stats.get("rows_downloaded", 0)
- ) # This may fail if the hash is different, but downloaded values are equal
+
+ # For fuzzily diffed types, some rows can be downloaded for local comparison. This happens
+ # when hashes are diferent but the essential payload is not; e.g. due to json serialization.
+ if not {source_type, target_type} & {'json', 'jsonb', 'array', 'struct'}:
+ self.assertEqual(0, differ.stats.get("rows_downloaded", 0))
# This section downloads all rows to ensure that Python agrees with the
# database, in terms of comparison.
diff --git a/tests/test_dbt.py b/tests/test_dbt.py
index 376c7e5c..cf6fc3d2 100644
--- a/tests/test_dbt.py
+++ b/tests/test_dbt.py
@@ -6,14 +6,17 @@
from .test_cli import run_datadiff_cli
from data_diff.dbt import (
+ _get_diff_vars,
dbt_diff,
_local_diff,
_cloud_diff,
DbtParser,
+ DiffVars,
+ DatafoldAPI,
+)
+from data_diff.dbt_parser import (
RUN_RESULTS_PATH,
- MANIFEST_PATH,
PROJECT_FILE,
- DiffVars,
)
import unittest
from unittest.mock import MagicMock, Mock, mock_open, patch, ANY
@@ -48,15 +51,64 @@ def test_get_datadiff_variables_empty(self):
with self.assertRaises(Exception):
DbtParser.get_datadiff_variables(mock_self)
+ def test_get_models(self):
+ mock_self = Mock()
+ mock_self.project_dir = Path()
+ mock_self.dbt_version = "1.5.0"
+ selection = "model+"
+ mock_return_value = Mock()
+ mock_self.get_dbt_selection_models.return_value = mock_return_value
+
+ models = DbtParser.get_models(mock_self, selection)
+ mock_self.get_dbt_selection_models.assert_called_once_with(selection)
+ self.assertEqual(models, mock_return_value)
+
+ def test_get_models_unsupported_manifest_version(self):
+ mock_self = Mock()
+ mock_self.project_dir = Path()
+ mock_self.dbt_version = "1.4.0"
+ selection = "model+"
+ mock_return_value = Mock()
+ mock_self.get_dbt_selection_models.return_value = mock_return_value
+
+ with self.assertRaises(Exception):
+ _ = DbtParser.get_models(mock_self, selection)
+ mock_self.get_dbt_selection_models.assert_not_called()
+
+ def test_get_models_no_runner(self):
+ mock_self = Mock()
+ mock_self.project_dir = Path()
+ mock_self.dbt_version = "1.5.0"
+ mock_self.dbt_runner = None
+ selection = "model+"
+ mock_return_value = Mock()
+ mock_self.get_dbt_selection_models.return_value = mock_return_value
+
+ with self.assertRaises(Exception):
+ _ = DbtParser.get_models(mock_self, selection)
+ mock_self.get_dbt_selection_models.assert_not_called()
+
+ def test_get_models_no_selection(self):
+ mock_self = Mock()
+ mock_self.project_dir = Path()
+ mock_self.dbt_version = "1.5.0"
+ selection = None
+ mock_return_value = Mock()
+ mock_self.get_run_results_models.return_value = mock_return_value
+
+ models = DbtParser.get_models(mock_self, selection)
+ mock_self.get_dbt_selection_models.assert_not_called()
+ mock_self.get_run_results_models.assert_called()
+ self.assertEqual(models, mock_return_value)
+
@patch("builtins.open", new_callable=mock_open, read_data="{}")
- def test_get_models(self, mock_open):
- expected_value = "expected_value"
+ def test_get_run_results_models(self, mock_open):
+ mock_model = {"success_unique_id": "expected_value"}
mock_self = Mock()
mock_self.project_dir = Path()
mock_run_results = Mock()
mock_success_result = Mock()
mock_failed_result = Mock()
- mock_manifest = Mock()
mock_self.parse_run_results.return_value = mock_run_results
mock_run_results.metadata.dbt_version = "1.0.0"
mock_success_result.unique_id = "success_unique_id"
@@ -64,19 +116,16 @@ def test_get_models(self, mock_open):
mock_success_result.status.name = "success"
mock_failed_result.status.name = "failed"
mock_run_results.results = [mock_success_result, mock_failed_result]
- mock_self.parse_manifest.return_value = mock_manifest
- mock_manifest.nodes = {"success_unique_id": expected_value}
+ mock_self.manifest_obj.nodes.get.return_value = mock_model
- models = DbtParser.get_models(mock_self)
+ models = DbtParser.get_run_results_models(mock_self)
- self.assertEqual(expected_value, models[0])
+ self.assertEqual(mock_model, models[0])
mock_open.assert_any_call(Path(RUN_RESULTS_PATH))
- mock_open.assert_any_call(Path(MANIFEST_PATH))
mock_self.parse_run_results.assert_called_once_with(run_results={})
- mock_self.parse_manifest.assert_called_once_with(manifest={})
@patch("builtins.open", new_callable=mock_open, read_data="{}")
- def test_get_models_bad_lower_dbt_version(self, mock_open):
+ def test_get_run_results_models_bad_lower_dbt_version(self, mock_open):
mock_self = Mock()
mock_self.project_dir = Path()
mock_run_results = Mock()
@@ -84,7 +133,7 @@ def test_get_models_bad_lower_dbt_version(self, mock_open):
mock_run_results.metadata.dbt_version = "0.19.0"
with self.assertRaises(Exception) as ex:
- DbtParser.get_models(mock_self)
+ DbtParser.get_run_results_models(mock_self)
mock_open.assert_called_once_with(Path(RUN_RESULTS_PATH))
mock_self.parse_run_results.assert_called_once_with(run_results={})
@@ -92,63 +141,42 @@ def test_get_models_bad_lower_dbt_version(self, mock_open):
self.assertIn("version to be", ex.exception.args[0])
@patch("builtins.open", new_callable=mock_open, read_data="{}")
- def test_get_models_bad_upper_dbt_version(self, mock_open):
- mock_self = Mock()
- mock_self.project_dir = Path()
- mock_run_results = Mock()
- mock_self.parse_run_results.return_value = mock_run_results
- mock_run_results.metadata.dbt_version = "1.5.1"
-
- with self.assertRaises(Exception) as ex:
- DbtParser.get_models(mock_self)
-
- mock_open.assert_called_once_with(Path(RUN_RESULTS_PATH))
- mock_self.parse_run_results.assert_called_once_with(run_results={})
- mock_self.parse_manifest.assert_not_called()
- self.assertIn("version to be", ex.exception.args[0])
-
- @patch("builtins.open", new_callable=mock_open, read_data="{}")
- def test_get_models_no_success(self, mock_open):
+ def test_get_run_results_models_no_success(self, mock_open):
mock_self = Mock()
mock_self.project_dir = Path()
mock_run_results = Mock()
mock_success_result = Mock()
mock_failed_result = Mock()
- mock_manifest = Mock()
mock_self.parse_run_results.return_value = mock_run_results
mock_run_results.metadata.dbt_version = "1.0.0"
mock_failed_result.unique_id = "failed_unique_id"
mock_success_result.status.name = "success"
mock_failed_result.status.name = "failed"
mock_run_results.results = [mock_failed_result]
- mock_self.parse_manifest.return_value = mock_manifest
- mock_manifest.nodes = {"success_unique_id": "a_unique_id"}
with self.assertRaises(Exception):
- DbtParser.get_models(mock_self)
+ DbtParser.get_run_results_models(mock_self)
mock_open.assert_any_call(Path(RUN_RESULTS_PATH))
- mock_open.assert_any_call(Path(MANIFEST_PATH))
mock_self.parse_run_results.assert_called_once_with(run_results={})
- mock_self.parse_manifest.assert_called_once_with(manifest={})
@patch("builtins.open", new_callable=mock_open, read_data="key:\n value")
- def test_set_project_dict(self, mock_open):
+ def test_get_project_dict(self, mock_open):
expected_dict = {"key1": "value1"}
mock_self = Mock()
mock_self.project_dir = Path()
mock_self.yaml.safe_load.return_value = expected_dict
- DbtParser.set_project_dict(mock_self)
+ project_dict = DbtParser.get_project_dict(mock_self)
- self.assertEqual(mock_self.project_dict, expected_dict)
+ self.assertEqual(project_dict, expected_dict)
mock_open.assert_called_once_with(Path(PROJECT_FILE))
- def test_set_connection_snowflake_success(self):
+ def test_set_connection_snowflake_success_password(self):
expected_driver = "snowflake"
expected_credentials = {"user": "user", "password": "password"}
mock_self = Mock()
- mock_self._get_connection_creds.return_value = (expected_credentials, expected_driver)
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
DbtParser.set_connection(mock_self)
@@ -156,13 +184,75 @@ def test_set_connection_snowflake_success(self):
self.assertEqual(mock_self.connection.get("driver"), expected_driver)
self.assertEqual(mock_self.connection.get("user"), expected_credentials["user"])
self.assertEqual(mock_self.connection.get("password"), expected_credentials["password"])
+ self.assertEqual(mock_self.connection.get("key"), None)
self.assertEqual(mock_self.requires_upper, True)
- def test_set_connection_snowflake_no_password(self):
+ def test_set_connection_snowflake_success_key(self):
+ expected_driver = "snowflake"
+ expected_credentials = {"user": "user", "private_key_path": "private_key_path"}
+ mock_self = Mock()
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
+
+ DbtParser.set_connection(mock_self)
+
+ self.assertIsInstance(mock_self.connection, dict)
+ self.assertEqual(mock_self.connection.get("driver"), expected_driver)
+ self.assertEqual(mock_self.connection.get("user"), expected_credentials["user"])
+ self.assertEqual(mock_self.connection.get("password"), None)
+ self.assertEqual(mock_self.connection.get("key"), expected_credentials["private_key_path"])
+ self.assertEqual(mock_self.requires_upper, True)
+
+ def test_set_connection_snowflake_success_key_and_passphrase(self):
+ expected_driver = "snowflake"
+ expected_credentials = {
+ "user": "user",
+ "private_key_path": "private_key_path",
+ "private_key_passphrase": "private_key_passphrase",
+ }
+ mock_self = Mock()
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
+
+ DbtParser.set_connection(mock_self)
+
+ self.assertIsInstance(mock_self.connection, dict)
+ self.assertEqual(mock_self.connection.get("driver"), expected_driver)
+ self.assertEqual(mock_self.connection.get("user"), expected_credentials["user"])
+ self.assertEqual(mock_self.connection.get("password"), None)
+ self.assertEqual(mock_self.connection.get("key"), expected_credentials["private_key_path"])
+ self.assertEqual(
+ mock_self.connection.get("private_key_passphrase"), expected_credentials["private_key_passphrase"]
+ )
+ self.assertEqual(mock_self.requires_upper, True)
+
+ def test_set_connection_snowflake_no_key_or_password(self):
expected_driver = "snowflake"
expected_credentials = {"user": "user"}
mock_self = Mock()
- mock_self._get_connection_creds.return_value = (expected_credentials, expected_driver)
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
+
+ with self.assertRaises(Exception):
+ DbtParser.set_connection(mock_self)
+
+ self.assertNotIsInstance(mock_self.connection, dict)
+
+ def test_set_connection_snowflake_authenticator(self):
+ expected_driver = "snowflake"
+ expected_credentials = {"user": "user", "authenticator": "authenticator"}
+ mock_self = Mock()
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
+
+ DbtParser.set_connection(mock_self)
+
+ self.assertIsInstance(mock_self.connection, dict)
+ self.assertEqual(mock_self.connection.get("driver"), expected_driver)
+ self.assertEqual(mock_self.connection.get("authenticator"), expected_credentials["authenticator"])
+ self.assertEqual(mock_self.connection.get("user"), expected_credentials["user"])
+
+ def test_set_connection_snowflake_key_and_password(self):
+ expected_driver = "snowflake"
+ expected_credentials = {"user": "user", "private_key_path": "private_key_path", "password": "password"}
+ mock_self = Mock()
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
with self.assertRaises(Exception):
DbtParser.set_connection(mock_self)
@@ -177,7 +267,7 @@ def test_set_connection_bigquery_success(self):
"dataset": "a_dataset",
}
mock_self = Mock()
- mock_self._get_connection_creds.return_value = (expected_credentials, expected_driver)
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
DbtParser.set_connection(mock_self)
@@ -195,7 +285,7 @@ def test_set_connection_bigquery_not_oauth(self):
}
mock_self = Mock()
- mock_self._get_connection_creds.return_value = (expected_credentials, expected_driver)
+ mock_self.get_connection_creds.return_value = (expected_credentials, expected_driver)
with self.assertRaises(Exception):
DbtParser.set_connection(mock_self)
@@ -205,7 +295,7 @@ def test_set_connection_not_implemented(self):
expected_driver = "unimplemented_provider"
mock_self = Mock()
- mock_self._get_connection_creds.return_value = (None, expected_driver)
+ mock_self.get_connection_creds.return_value = (None, expected_driver)
with self.assertRaises(NotImplementedError):
DbtParser.set_connection(mock_self)
@@ -228,7 +318,7 @@ def test_get_connection_creds_success(self, mock_open):
mock_self.project_dict = {"profile": "a_profile"}
mock_self.yaml.safe_load.return_value = profiles_dict
mock_self.ProfileRenderer().render_data.return_value = profile
- credentials, conn_type = DbtParser._get_connection_creds(mock_self)
+ credentials, conn_type = DbtParser.get_connection_creds(mock_self)
self.assertEqual(credentials, expected_credentials)
self.assertEqual(conn_type, "type1")
@@ -242,7 +332,7 @@ def test_get_connection_no_matching_profile(self, mock_open):
profile = profiles_dict["a_profile"]
mock_self.ProfileRenderer().render_data.return_value = profile
with self.assertRaises(ValueError):
- _, _ = DbtParser._get_connection_creds(mock_self)
+ _, _ = DbtParser.get_connection_creds(mock_self)
@patch("builtins.open", new_callable=mock_open, read_data="")
def test_get_connection_no_target(self, mock_open):
@@ -260,7 +350,7 @@ def test_get_connection_no_target(self, mock_open):
mock_self.project_dict = {"profile": "a_profile"}
mock_self.yaml.safe_load.return_value = profiles_dict
with self.assertRaises(ValueError):
- _, _ = DbtParser._get_connection_creds(mock_self)
+ _, _ = DbtParser.get_connection_creds(mock_self)
profile_yaml_no_outputs = """
a_profile:
@@ -277,7 +367,7 @@ def test_get_connection_no_outputs(self, mock_open):
mock_self.ProfileRenderer().render_data.return_value = profile
mock_self.yaml.safe_load.return_value = profiles_dict
with self.assertRaises(ValueError):
- _, _ = DbtParser._get_connection_creds(mock_self)
+ _, _ = DbtParser.get_connection_creds(mock_self)
@patch("builtins.open", new_callable=mock_open, read_data="")
def test_get_connection_no_credentials(self, mock_open):
@@ -294,7 +384,7 @@ def test_get_connection_no_credentials(self, mock_open):
profile = profiles_dict["a_profile"]
mock_self.ProfileRenderer().render_data.return_value = profile
with self.assertRaises(ValueError):
- _, _ = DbtParser._get_connection_creds(mock_self)
+ _, _ = DbtParser.get_connection_creds(mock_self)
@patch("builtins.open", new_callable=mock_open, read_data="")
def test_get_connection_no_target_credentials(self, mock_open):
@@ -313,7 +403,7 @@ def test_get_connection_no_target_credentials(self, mock_open):
mock_self.ProfileRenderer().render_data.return_value = profile
mock_self.yaml.safe_load.return_value = profiles_dict
with self.assertRaises(ValueError):
- _, _ = DbtParser._get_connection_creds(mock_self)
+ _, _ = DbtParser.get_connection_creds(mock_self)
@patch("builtins.open", new_callable=mock_open, read_data="")
def test_get_connection_no_type(self, mock_open):
@@ -330,20 +420,42 @@ def test_get_connection_no_type(self, mock_open):
profile = profiles_dict["a_profile"]
mock_self.ProfileRenderer().render_data.return_value = profile
with self.assertRaises(ValueError):
- _, _ = DbtParser._get_connection_creds(mock_self)
+ _, _ = DbtParser.get_connection_creds(mock_self)
+
+
+EXAMPLE_DIFF_RESULTS = {
+ "pks": {"exclusives": [5, 3]},
+ "values": {
+ "rows_with_differences": 2,
+ "total_rows": 10,
+ "columns_diff_stats": [
+ {"column_name": "name", "match": 80.0},
+ {"column_name": "age", "match": 100.0},
+ {"column_name": "city", "match": 0.0},
+ {"column_name": "country", "match": 100.0},
+ ],
+ },
+}
class TestDbtDiffer(unittest.TestCase):
- # These two integration tests can be used to test a real diff
- # export DATA_DIFF_DBT_PROJ=/path/to/a/dbt/project
- # Expects a valid dbt project using a ~/.dbt/profiles.yml with run results
+ # Set DATA_DIFF_DBT_PROJ to use your own dbt project, otherwise uses the duckdb project in tests/dbt_artifacts
def test_integration_basic_dbt(self):
- project_dir = os.environ.get("DATA_DIFF_DBT_PROJ")
- if project_dir is not None:
- diff = run_datadiff_cli("--dbt", "--dbt-project-dir", project_dir)
- assert diff[-1].decode("utf-8") == "Diffs Complete!"
- else:
- pass
+ artifacts_path = os.getcwd() + "/tests/dbt_artifacts"
+ test_project_path = os.environ.get("DATA_DIFF_DBT_PROJ") or artifacts_path
+ diff = run_datadiff_cli(
+ "--dbt", "--dbt-project-dir", test_project_path, "--dbt-profiles-dir", test_project_path
+ )
+
+ # assertions for the diff that exists in tests/dbt_artifacts/jaffle_shop.duckdb
+ if test_project_path == artifacts_path:
+ diff_string = b"".join(diff).decode("utf-8")
+ # 5 diffs were ran
+ assert diff_string.count("<>") == 5
+ # 4 with no diffs
+ assert diff_string.count("No row differences") == 4
+ # 1 with a diff
+ assert diff_string.count(" Rows Added Rows Removed") == 1
def test_integration_cloud_dbt(self):
project_dir = os.environ.get("DATA_DIFF_DBT_PROJ")
@@ -364,20 +476,27 @@ def test_local_diff(self, mock_diff_tables):
mock_diff = MagicMock()
mock_diff_tables.return_value = mock_diff
mock_diff.__iter__.return_value = [1, 2, 3]
+ threads = None
+ where = "a_string"
dev_qualified_list = ["dev_db", "dev_schema", "dev_table"]
prod_qualified_list = ["prod_db", "prod_schema", "prod_table"]
expected_keys = ["key"]
- diff_vars = DiffVars(dev_qualified_list, prod_qualified_list, expected_keys, None, mock_connection, None)
+ diff_vars = DiffVars(dev_qualified_list, prod_qualified_list, expected_keys, mock_connection, threads, where)
with patch("data_diff.dbt.connect_to_table", side_effect=[mock_table1, mock_table2]) as mock_connect:
_local_diff(diff_vars)
mock_diff_tables.assert_called_once_with(
- mock_table1, mock_table2, threaded=True, algorithm=Algorithm.JOINDIFF, extra_columns=ANY
+ mock_table1,
+ mock_table2,
+ threaded=True,
+ algorithm=Algorithm.JOINDIFF,
+ extra_columns=ANY,
+ where=where,
)
self.assertEqual(len(mock_diff_tables.call_args[1]["extra_columns"]), 2)
self.assertEqual(mock_connect.call_count, 2)
- mock_connect.assert_any_call(mock_connection, ".".join(dev_qualified_list), tuple(expected_keys), None)
- mock_connect.assert_any_call(mock_connection, ".".join(prod_qualified_list), tuple(expected_keys), None)
+ mock_connect.assert_any_call(mock_connection, ".".join(dev_qualified_list), tuple(expected_keys), threads)
+ mock_connect.assert_any_call(mock_connection, ".".join(prod_qualified_list), tuple(expected_keys), threads)
mock_diff.get_stats_string.assert_called_once()
@patch("data_diff.dbt.diff_tables")
@@ -394,12 +513,14 @@ def test_local_diff_no_diffs(self, mock_diff_tables):
dev_qualified_list = ["dev_db", "dev_schema", "dev_table"]
prod_qualified_list = ["prod_db", "prod_schema", "prod_table"]
expected_keys = ["primary_key_column"]
- diff_vars = DiffVars(dev_qualified_list, prod_qualified_list, expected_keys, None, mock_connection, None)
+ threads = None
+ where = "a_string"
+ diff_vars = DiffVars(dev_qualified_list, prod_qualified_list, expected_keys, mock_connection, threads, where)
with patch("data_diff.dbt.connect_to_table", side_effect=[mock_table1, mock_table2]) as mock_connect:
_local_diff(diff_vars)
mock_diff_tables.assert_called_once_with(
- mock_table1, mock_table2, threaded=True, algorithm=Algorithm.JOINDIFF, extra_columns=ANY
+ mock_table1, mock_table2, threaded=True, algorithm=Algorithm.JOINDIFF, extra_columns=ANY, where=where
)
self.assertEqual(len(mock_diff_tables.call_args[1]["extra_columns"]), 2)
self.assertEqual(mock_connect.call_count, 2)
@@ -409,107 +530,115 @@ def test_local_diff_no_diffs(self, mock_diff_tables):
@patch("data_diff.dbt.rich.print")
@patch("data_diff.dbt.os.environ")
- @patch("data_diff.dbt.requests.request")
- def test_cloud_diff(self, mock_request, mock_os_environ, mock_print):
+ @patch("data_diff.dbt.DatafoldAPI")
+ def test_cloud_diff(self, mock_api, mock_os_environ, mock_print):
expected_api_key = "an_api_key"
- mock_response = Mock()
- mock_response.json.return_value = {"id": 123}
- mock_request.return_value = mock_response
+ mock_api.create_data_diff.return_value = {"id": 123}
mock_os_environ.get.return_value = expected_api_key
dev_qualified_list = ["dev_db", "dev_schema", "dev_table"]
prod_qualified_list = ["prod_db", "prod_schema", "prod_table"]
expected_datasource_id = 1
expected_primary_keys = ["primary_key_column"]
- diff_vars = DiffVars(
- dev_qualified_list, prod_qualified_list, expected_primary_keys, expected_datasource_id, None, None
- )
- _cloud_diff(diff_vars)
-
- mock_request.assert_called_once()
- mock_print.assert_called_once()
- request_data_dict = mock_request.call_args[1]["json"]
- self.assertEqual(
- mock_request.call_args[1]["headers"]["Authorization"],
- "Key " + expected_api_key,
- )
- self.assertEqual(request_data_dict["data_source1_id"], expected_datasource_id)
- self.assertEqual(request_data_dict["data_source2_id"], expected_datasource_id)
- self.assertEqual(request_data_dict["table1"], prod_qualified_list)
- self.assertEqual(request_data_dict["table2"], dev_qualified_list)
- self.assertEqual(request_data_dict["pk_columns"], expected_primary_keys)
+ connection = None
+ threads = None
+ where = "a_string"
+ diff_vars = DiffVars(dev_qualified_list, prod_qualified_list, expected_primary_keys, connection, threads, where)
+ _cloud_diff(diff_vars, expected_datasource_id, api=mock_api)
- @patch("data_diff.dbt.rich.print")
- @patch("data_diff.dbt.os.environ")
- @patch("data_diff.dbt.requests.request")
- def test_cloud_diff_ds_id_none(self, mock_request, mock_os_environ, mock_print):
- expected_api_key = "an_api_key"
- mock_response = Mock()
- mock_response.json.return_value = {"id": 123}
- mock_request.return_value = mock_response
- mock_os_environ.get.return_value = expected_api_key
- dev_qualified_list = ["dev_db", "dev_schema", "dev_table"]
- prod_qualified_list = ["prod_db", "prod_schema", "prod_table"]
- expected_datasource_id = None
- primary_keys = ["primary_key_column"]
- diff_vars = DiffVars(dev_qualified_list, prod_qualified_list, primary_keys, expected_datasource_id, None, None)
- with self.assertRaises(ValueError):
- _cloud_diff(diff_vars)
+ mock_api.create_data_diff.assert_called_once()
+ self.assertEqual(mock_print.call_count, 2)
- mock_request.assert_not_called()
- mock_print.assert_not_called()
+ payload = mock_api.create_data_diff.call_args[1]["payload"]
+ self.assertEqual(payload.data_source1_id, expected_datasource_id)
+ self.assertEqual(payload.data_source2_id, expected_datasource_id)
+ self.assertEqual(payload.table1, prod_qualified_list)
+ self.assertEqual(payload.table2, dev_qualified_list)
+ self.assertEqual(payload.pk_columns, expected_primary_keys)
+ self.assertEqual(payload.filter1, where)
+ self.assertEqual(payload.filter2, where)
+ @patch("data_diff.dbt._initialize_api")
+ @patch("data_diff.dbt._get_diff_vars")
+ @patch("data_diff.dbt._local_diff")
+ @patch("data_diff.dbt._cloud_diff")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
- @patch("data_diff.dbt.os.environ")
- @patch("data_diff.dbt.requests.request")
- def test_cloud_diff_api_key_none(self, mock_request, mock_os_environ, mock_print):
- expected_api_key = None
- mock_response = Mock()
- mock_response.json.return_value = {"id": 123}
- mock_request.return_value = mock_response
- mock_os_environ.get.return_value = expected_api_key
- dev_qualified_list = ["dev_db", "dev_schema", "dev_table"]
- prod_qualified_list = ["prod_db", "prod_schema", "prod_table"]
- expected_datasource_id = 1
- primary_keys = ["primary_key_column"]
- diff_vars = DiffVars(dev_qualified_list, prod_qualified_list, primary_keys, expected_datasource_id, None, None)
- with self.assertRaises(ValueError):
- _cloud_diff(diff_vars)
+ def test_diff_is_cloud(
+ self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars, mock_initialize_api
+ ):
+ mock_dbt_parser_inst = Mock()
+ mock_model = Mock()
+ expected_dbt_vars_dict = {
+ "prod_database": "prod_db",
+ "prod_schema": "prod_schema",
+ "datasource_id": 1,
+ }
+ host = "a_host"
+ api_key = "a_api_key"
+ api = DatafoldAPI(api_key=api_key, host=host)
+ mock_initialize_api.return_value = api
+ connection = None
+ threads = None
+ where = "a_string"
- mock_request.assert_not_called()
- mock_print.assert_not_called()
+ mock_dbt_parser.return_value = mock_dbt_parser_inst
+ mock_dbt_parser_inst.get_models.return_value = [mock_model]
+ mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
+ expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], connection, threads, where)
+ mock_get_diff_vars.return_value = expected_diff_vars
+ dbt_diff(is_cloud=True)
+ mock_dbt_parser_inst.get_models.assert_called_once()
+ mock_dbt_parser_inst.set_connection.assert_not_called()
+ mock_initialize_api.assert_called_once()
+ mock_cloud_diff.assert_called_once_with(expected_diff_vars, 1, api)
+ mock_local_diff.assert_not_called()
+ mock_print.assert_called_once()
+
+ @patch("data_diff.dbt._initialize_api")
@patch("data_diff.dbt._get_diff_vars")
@patch("data_diff.dbt._local_diff")
@patch("data_diff.dbt._cloud_diff")
- @patch("data_diff.dbt.DbtParser.__new__")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
- def test_diff_is_cloud(self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars):
+ @patch("builtins.input", return_value="n")
+ def test_diff_is_cloud_no_ds_id(
+ self, _, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars, mock_initialize_api
+ ):
mock_dbt_parser_inst = Mock()
mock_model = Mock()
expected_dbt_vars_dict = {
"prod_database": "prod_db",
"prod_schema": "prod_schema",
- "datasource_id": 1,
}
+ host = "a_host"
+ api_key = "a_api_key"
+ api = DatafoldAPI(api_key=api_key, host=host)
+ mock_initialize_api.return_value = api
+ connection = None
+ threads = None
+ where = "a_string"
mock_dbt_parser.return_value = mock_dbt_parser_inst
mock_dbt_parser_inst.get_models.return_value = [mock_model]
mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
- expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], 123, None, None)
+ expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], connection, threads, where)
mock_get_diff_vars.return_value = expected_diff_vars
- dbt_diff(is_cloud=True)
+
+ with self.assertRaises(ValueError):
+ dbt_diff(is_cloud=True)
mock_dbt_parser_inst.get_models.assert_called_once()
- mock_dbt_parser_inst.set_project_dict.assert_called_once()
mock_dbt_parser_inst.set_connection.assert_not_called()
- mock_cloud_diff.assert_called_once_with(expected_diff_vars)
+ mock_initialize_api.assert_called_once()
+ mock_cloud_diff.assert_not_called()
mock_local_diff.assert_not_called()
mock_print.assert_called_once()
@patch("data_diff.dbt._get_diff_vars")
@patch("data_diff.dbt._local_diff")
@patch("data_diff.dbt._cloud_diff")
- @patch("data_diff.dbt.DbtParser.__new__")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
def test_diff_is_not_cloud(self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars):
mock_dbt_parser_inst = Mock()
@@ -518,25 +647,26 @@ def test_diff_is_not_cloud(self, mock_print, mock_dbt_parser, mock_cloud_diff, m
expected_dbt_vars_dict = {
"prod_database": "prod_db",
"prod_schema": "prod_schema",
- "datasource_id": 1,
}
mock_dbt_parser_inst.get_models.return_value = [mock_model]
mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
- expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], 123, None, None)
+ connection = None
+ threads = None
+ where = "a_string"
+ expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], connection, threads, where)
mock_get_diff_vars.return_value = expected_diff_vars
dbt_diff(is_cloud=False)
mock_dbt_parser_inst.get_models.assert_called_once()
- mock_dbt_parser_inst.set_project_dict.assert_called_once()
mock_dbt_parser_inst.set_connection.assert_called_once()
mock_cloud_diff.assert_not_called()
mock_local_diff.assert_called_once_with(expected_diff_vars)
- mock_print.assert_called_once()
+ mock_print.assert_not_called()
@patch("data_diff.dbt._get_diff_vars")
@patch("data_diff.dbt._local_diff")
@patch("data_diff.dbt._cloud_diff")
- @patch("data_diff.dbt.DbtParser.__new__")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
def test_diff_no_prod_configs(
self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars
@@ -550,13 +680,15 @@ def test_diff_no_prod_configs(
mock_dbt_parser_inst.get_models.return_value = [mock_model]
mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
- expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], 123, None, None)
+ connection = None
+ threads = None
+ where = "a_string"
+ expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], connection, threads, where)
mock_get_diff_vars.return_value = expected_diff_vars
with self.assertRaises(ValueError):
dbt_diff(is_cloud=False)
mock_dbt_parser_inst.get_models.assert_called_once()
- mock_dbt_parser_inst.set_project_dict.assert_called_once()
mock_dbt_parser_inst.set_connection.assert_called_once()
mock_dbt_parser_inst.get_primary_keys.assert_not_called()
mock_cloud_diff.assert_not_called()
@@ -566,7 +698,7 @@ def test_diff_no_prod_configs(
@patch("data_diff.dbt._get_diff_vars")
@patch("data_diff.dbt._local_diff")
@patch("data_diff.dbt._cloud_diff")
- @patch("data_diff.dbt.DbtParser.__new__")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
def test_diff_only_prod_db(self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars):
mock_dbt_parser_inst = Mock()
@@ -578,21 +710,23 @@ def test_diff_only_prod_db(self, mock_print, mock_dbt_parser, mock_cloud_diff, m
}
mock_dbt_parser_inst.get_models.return_value = [mock_model]
mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
- expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], 123, None, None)
+ connection = None
+ threads = None
+ where = "a_string"
+ expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], connection, threads, where)
mock_get_diff_vars.return_value = expected_diff_vars
dbt_diff(is_cloud=False)
mock_dbt_parser_inst.get_models.assert_called_once()
- mock_dbt_parser_inst.set_project_dict.assert_called_once()
mock_dbt_parser_inst.set_connection.assert_called_once()
mock_cloud_diff.assert_not_called()
mock_local_diff.assert_called_once_with(expected_diff_vars)
- mock_print.assert_called_once()
+ mock_print.assert_not_called()
@patch("data_diff.dbt._get_diff_vars")
@patch("data_diff.dbt._local_diff")
@patch("data_diff.dbt._cloud_diff")
- @patch("data_diff.dbt.DbtParser.__new__")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
def test_diff_only_prod_schema(
self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars
@@ -607,26 +741,29 @@ def test_diff_only_prod_schema(
mock_dbt_parser_inst.get_models.return_value = [mock_model]
mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
- expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], 123, None, None)
+ connection = None
+ threads = None
+ where = "a_string"
+ expected_diff_vars = DiffVars(["dev"], ["prod"], ["pks"], connection, threads, where)
mock_get_diff_vars.return_value = expected_diff_vars
with self.assertRaises(ValueError):
dbt_diff(is_cloud=False)
mock_dbt_parser_inst.get_models.assert_called_once()
- mock_dbt_parser_inst.set_project_dict.assert_called_once()
mock_dbt_parser_inst.set_connection.assert_called_once()
mock_dbt_parser_inst.get_primary_keys.assert_not_called()
mock_cloud_diff.assert_not_called()
mock_local_diff.assert_not_called()
mock_print.assert_not_called()
+ @patch("data_diff.dbt._initialize_api")
@patch("data_diff.dbt._get_diff_vars")
@patch("data_diff.dbt._local_diff")
@patch("data_diff.dbt._cloud_diff")
- @patch("data_diff.dbt.DbtParser.__new__")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
def test_diff_is_cloud_no_pks(
- self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars
+ self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars, mock_initialize_api
):
mock_dbt_parser_inst = Mock()
mock_dbt_parser.return_value = mock_dbt_parser_inst
@@ -636,15 +773,22 @@ def test_diff_is_cloud_no_pks(
"prod_schema": "prod_schema",
"datasource_id": 1,
}
+ host = "a_host"
+ api_key = "a_api_key"
+ api = DatafoldAPI(api_key=api_key, host=host)
+ mock_initialize_api.return_value = api
mock_dbt_parser_inst.get_models.return_value = [mock_model]
mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
- expected_diff_vars = DiffVars(["dev"], ["prod"], [], 123, None, None)
+ connection = None
+ threads = None
+ where = "a_string"
+ expected_diff_vars = DiffVars(["dev"], ["prod"], [], connection, threads, where)
mock_get_diff_vars.return_value = expected_diff_vars
dbt_diff(is_cloud=True)
+ mock_initialize_api.assert_called_once()
mock_dbt_parser_inst.get_models.assert_called_once()
- mock_dbt_parser_inst.set_project_dict.assert_called_once()
mock_dbt_parser_inst.set_connection.assert_not_called()
mock_cloud_diff.assert_not_called()
mock_local_diff.assert_not_called()
@@ -653,7 +797,7 @@ def test_diff_is_cloud_no_pks(
@patch("data_diff.dbt._get_diff_vars")
@patch("data_diff.dbt._local_diff")
@patch("data_diff.dbt._cloud_diff")
- @patch("data_diff.dbt.DbtParser.__new__")
+ @patch("data_diff.dbt_parser.DbtParser.__new__")
@patch("data_diff.dbt.rich.print")
def test_diff_not_is_cloud_no_pks(
self, mock_print, mock_dbt_parser, mock_cloud_diff, mock_local_diff, mock_get_diff_vars
@@ -669,13 +813,198 @@ def test_diff_not_is_cloud_no_pks(
mock_dbt_parser_inst.get_models.return_value = [mock_model]
mock_dbt_parser_inst.get_datadiff_variables.return_value = expected_dbt_vars_dict
-
- expected_diff_vars = DiffVars(["dev"], ["prod"], [], 123, None, None)
+ connection = None
+ threads = None
+ where = "a_string"
+ expected_diff_vars = DiffVars(["dev"], ["prod"], [], connection, threads, where)
mock_get_diff_vars.return_value = expected_diff_vars
dbt_diff(is_cloud=False)
mock_dbt_parser_inst.get_models.assert_called_once()
- mock_dbt_parser_inst.set_project_dict.assert_called_once()
mock_dbt_parser_inst.set_connection.assert_called_once()
mock_cloud_diff.assert_not_called()
mock_local_diff.assert_not_called()
- self.assertEqual(mock_print.call_count, 2)
+ self.assertEqual(mock_print.call_count, 1)
+
+ def test_get_diff_vars_replace_custom_schema(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ prod_schema = "a_prod_schema"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_custom_schema"
+ mock_model.config.schema_ = mock_model.schema_
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+ mock_model.meta = None
+
+ diff_vars = _get_diff_vars(mock_dbt_parser, prod_database, prod_schema, "prod_", mock_model)
+
+ assert diff_vars.dev_path == [mock_model.database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.prod_path == [prod_database, "prod_" + mock_model.schema_, mock_model.alias]
+ assert diff_vars.primary_keys == primary_keys
+ assert diff_vars.connection == mock_dbt_parser.connection
+ assert diff_vars.threads == mock_dbt_parser.threads
+ assert prod_schema not in diff_vars.prod_path
+
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
+
+ def test_get_diff_vars_static_custom_schema(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ prod_schema = "a_prod_schema"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_custom_schema"
+ mock_model.config.schema_ = mock_model.schema_
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+ mock_model.meta = None
+
+ diff_vars = _get_diff_vars(mock_dbt_parser, prod_database, prod_schema, "prod", mock_model)
+
+ assert diff_vars.dev_path == [mock_model.database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.prod_path == [prod_database, "prod", mock_model.alias]
+ assert diff_vars.primary_keys == primary_keys
+ assert diff_vars.connection == mock_dbt_parser.connection
+ assert diff_vars.threads == mock_dbt_parser.threads
+ assert prod_schema not in diff_vars.prod_path
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
+
+ def test_get_diff_vars_no_custom_schema_on_model(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ prod_schema = "a_prod_schema"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_custom_schema"
+ mock_model.config.schema_ = None
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+ mock_model.meta = None
+
+ diff_vars = _get_diff_vars(mock_dbt_parser, prod_database, prod_schema, "prod", mock_model)
+
+ assert diff_vars.dev_path == [mock_model.database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.prod_path == [prod_database, prod_schema, mock_model.alias]
+ assert diff_vars.primary_keys == primary_keys
+ assert diff_vars.connection == mock_dbt_parser.connection
+ assert diff_vars.threads == mock_dbt_parser.threads
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
+
+ def test_get_diff_vars_match_dev_schema(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_schema"
+ mock_model.config.schema_ = None
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+ mock_model.meta = None
+
+ diff_vars = _get_diff_vars(mock_dbt_parser, prod_database, None, None, mock_model)
+
+ assert diff_vars.dev_path == [mock_model.database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.prod_path == [prod_database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.primary_keys == primary_keys
+ assert diff_vars.connection == mock_dbt_parser.connection
+ assert diff_vars.threads == mock_dbt_parser.threads
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
+
+ def test_get_diff_custom_schema_no_config_exception(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ prod_schema = "a_prod_schema"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_schema"
+ mock_model.config.schema_ = "a_custom_schema"
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+
+ with self.assertRaises(ValueError):
+ _get_diff_vars(mock_dbt_parser, prod_database, prod_schema, None, mock_model)
+
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
+
+ def test_get_diff_vars_meta_where(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_schema"
+ mock_model.config.schema_ = None
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+ where = "a filter"
+ mock_model.meta = {"datafold": {"datadiff": {"filter": where}}}
+
+ diff_vars = _get_diff_vars(mock_dbt_parser, prod_database, None, None, mock_model)
+
+ assert diff_vars.dev_path == [mock_model.database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.prod_path == [prod_database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.primary_keys == primary_keys
+ assert diff_vars.connection == mock_dbt_parser.connection
+ assert diff_vars.threads == mock_dbt_parser.threads
+ self.assertEqual(diff_vars.where_filter, where)
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
+
+ def test_get_diff_vars_meta_unrelated(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_schema"
+ mock_model.config.schema_ = None
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+ where = None
+ mock_model.meta = {"key": "value"}
+
+ diff_vars = _get_diff_vars(mock_dbt_parser, prod_database, None, None, mock_model)
+
+ assert diff_vars.dev_path == [mock_model.database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.prod_path == [prod_database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.primary_keys == primary_keys
+ assert diff_vars.connection == mock_dbt_parser.connection
+ assert diff_vars.threads == mock_dbt_parser.threads
+ self.assertEqual(diff_vars.where_filter, where)
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
+
+ def test_get_diff_vars_meta_none(self):
+ mock_model = Mock()
+ prod_database = "a_prod_db"
+ primary_keys = ["a_primary_key"]
+ mock_model.database = "a_dev_db"
+ mock_model.schema_ = "a_schema"
+ mock_model.config.schema_ = None
+ mock_model.alias = "a_model_name"
+ mock_dbt_parser = Mock()
+ mock_dbt_parser.get_pk_from_model.return_value = primary_keys
+ mock_dbt_parser.requires_upper = False
+ where = None
+ mock_model.meta = None
+
+ diff_vars = _get_diff_vars(mock_dbt_parser, prod_database, None, None, mock_model)
+
+ assert diff_vars.dev_path == [mock_model.database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.prod_path == [prod_database, mock_model.schema_, mock_model.alias]
+ assert diff_vars.primary_keys == primary_keys
+ assert diff_vars.connection == mock_dbt_parser.connection
+ assert diff_vars.threads == mock_dbt_parser.threads
+ self.assertEqual(diff_vars.where_filter, where)
+ mock_dbt_parser.get_pk_from_model.assert_called_once()
diff --git a/tests/test_diff_tables.py b/tests/test_diff_tables.py
index 352aa8fe..0f5d5164 100644
--- a/tests/test_diff_tables.py
+++ b/tests/test_diff_tables.py
@@ -3,8 +3,8 @@
import uuid
import unittest
-from sqeleton.queries import table, this, commit, code
-from sqeleton.utils import ArithAlphanumeric, numberToAlphanum
+from data_diff.sqeleton.queries import table, this, commit, code
+from data_diff.sqeleton.utils import ArithAlphanumeric, numberToAlphanum
from data_diff.hashdiff_tables import HashDiffer
from data_diff.joindiff_tables import JoinDiffer
@@ -425,7 +425,6 @@ def setUp(self):
self.b = table_segment(self.connection, self.table_dst_path, "id", "text_comment", case_sensitive=False)
def test_alphanum_keys(self):
-
differ = HashDiffer(bisection_factor=2, bisection_threshold=3)
diff = list(differ.diff_tables(self.a, self.b))
self.assertEqual(diff, [("-", (str(self.new_alphanum), "This one is different"))])
diff --git a/tests/test_joindiff.py b/tests/test_joindiff.py
index 08b8189e..9cc2197c 100644
--- a/tests/test_joindiff.py
+++ b/tests/test_joindiff.py
@@ -1,8 +1,8 @@
from typing import List
from datetime import datetime
-from sqeleton.queries.ast_classes import TablePath
-from sqeleton.queries import table, commit
+from data_diff.sqeleton.queries.ast_classes import TablePath
+from data_diff.sqeleton.queries import table, commit
from data_diff.table_segment import TableSegment
from data_diff import databases as db
from data_diff.joindiff_tables import JoinDiffer
diff --git a/tests/test_postgresql.py b/tests/test_postgresql.py
index b8881498..0f24198e 100644
--- a/tests/test_postgresql.py
+++ b/tests/test_postgresql.py
@@ -1,6 +1,6 @@
import unittest
-from sqeleton.queries import table, commit
+from data_diff.sqeleton.queries import table, commit
from data_diff import TableSegment, HashDiffer
from data_diff import databases as db