Skip to content

Conversation

@hadia206
Copy link
Contributor

@hadia206 hadia206 commented Jun 13, 2025

resolves #296

  • Add Snowflake dialect support
  • Add SF dialect tests for TPC-H and Defog
  • Update dialect bindings based on available Snowflake functions.
  • Update PR CI to include flag for testing all dialects.
  • Update Defog init scripts in different dialects to match original scripts + add one for Snowflake.

All dialects testing passed, see here

Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left behind some initial feedback

Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more revisions

@hadia206 hadia206 changed the base branch from main to kian/decor_semi_anti June 19, 2025 23:07
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@hadia206 hadia206 changed the base branch from kian/decor_semi_anti to main June 20, 2025 14:10
@hadia206 hadia206 changed the base branch from main to kian/decor_semi_anti June 20, 2025 14:57
Base automatically changed from kian/decor_semi_anti to main June 23, 2025 15:44
@hadia206 hadia206 changed the title [DRAFT] SF Testing and Dialect Snowflake Dialect and Testing Aug 25, 2025
@hadia206 hadia206 marked this pull request as ready for review August 25, 2025 23:39
@hadia206 hadia206 requested review from a team, john-sanchez31, juankx-bodo and knassre-bodo and removed request for a team August 25, 2025 23:39
Comment on lines +100 to +101
When submitting a PR, you can control which CI tests run by adding special flags
to your **latest commit message**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's specify that all of these flags are case insensitive

README.md Outdated
When submitting a PR, you can control which CI tests run by adding special flags
to your **latest commit message**.

- To run **PyDough CI tests**, add: `[run CI]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify that this one will not run tests from any sql database except sqlite

@@ -0,0 +1,307 @@
{
Copy link
Contributor

@knassre-bodo knassre-bodo Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also show another example where we create the connector object, THEN pass it in to Snowflake with connection=..., then run another test on that (& after that test show us doing something with the connector, like fetching the query text of the last query run through that connector).


Reply via ReviewNB

@@ -0,0 +1,307 @@
{
Copy link
Contributor

@knassre-bodo knassre-bodo Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: let's fix the formatting on the first line so lines is on the next line and is matched up with .PARTITION


Reply via ReviewNB

Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some last nitpicks and cleanup revisions

"validate_qualify_columns": False,
}
# Exclude Snowflake dialect to avoid some issues
# related to qualify and column decorrelation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it column de-correlation or just column name qualification? I ask because I'm pretty sure qualification happens somewhere else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I mixed things up.
updated it

Comment on lines +222 to +244
@pytest.fixture(scope="session")
def get_sf_sample_graph(
sf_sample_graph_path: str,
valid_sample_graph_names: set[str],
) -> graph_fetcher:
"""
A function that takes in the name of a graph from the supported sample
Snowflake graph names and returns the metadata for that PyDough graph.
"""

@cache
def impl(name: str) -> GraphMetadata:
if name not in valid_sample_graph_names:
raise Exception(f"Unrecognized graph name '{name}'")
return pydough.parse_json_metadata_from_file(
file_path=sf_sample_graph_path, graph_name=name
)

return impl


@pytest.fixture(scope="session")
def get_sf_defog_graphs() -> graph_fetcher:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge these: we can place the sample graphs & defog graphs for Snowflake in the same JSON file, and merge the fixtures since they are functions that take in the name of a graph and return that graph within that file. We can ignore the valid_sample_graph_names stuff since that is a mostly-redundant layer of error checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address that in a followup PR. For now, I will keep them separate to match what we have for SQLite.

return expr.this if isinstance(expr, SQLGlotAlias) else expr


def is_boolean_expression(expr: SQLGlotExpression) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used anywhere? I can't find it. If not, let's delete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, old code

Comment on lines 77 to 80
def convert_get_part(
self, args: SQLGlotExpression, types: list[PyDoughType]
) -> SQLGlotExpression:
return sqlglot_expressions.Anonymous(this="SPLIT_PART", expressions=args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get rid of this if we are using PYDOP_TO_SNOWFLAKE_FUNC

Comment on lines +58 to +60
def convert_sum(
self, arg: SQLGlotExpression, types: list[PyDoughType]
) -> SQLGlotExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea: let's move this method to the base class, then have this be the override, that way we don't need to case on SUM in the subclasses' implementation of convert_call_to_sqlglot. We can also edit the mysql file accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated base and SF files.
MySQL doesn't have SUM variant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I was thinking of PostgreSQL

Comment on lines 66 to 67
arg (SQLGlotExpression): The argument to the SUM function.
types (list[PyDoughType]): The types of the arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include backticks for better tooltips, and exclude the types (since they are already in the type annotation).

Suggested change
arg (SQLGlotExpression): The argument to the SUM function.
types (list[PyDoughType]): The types of the arguments.
`arg` The argument to the SUM function.
`types`: The types of the arguments.

constant_propagation: bool = False,
dialect: DialectType = None,
max_depth: int | None = None,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hint for the return and throughout the function if it's necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste from SqlGlot so won't update that

Args:
expression: expression to simplify
constant_propagation: whether the constant propagation rule should be used
max_depth: Chains of Connectors (AND, OR, etc) exceeding `max_depth` will be skipped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the dialect parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

constant_propagation: whether the constant propagation rule should be used
max_depth: Chains of Connectors (AND, OR, etc) exceeding `max_depth` will be skipped
Returns:
sqlglot.Expression: simplified expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sqlglot.Expression: simplified expression
simplified expression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Comment on lines 98 to 101
if isinstance(expr, boolean_types):
return True

return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this just to:

Suggested change
if isinstance(expr, boolean_types):
return True
return False
return isinstance(expr, boolean_types)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. However, whole function not needed so deleted that code.

unit: DateTimeUnit,
) -> SQLGlotExpression:
# Update argument type to fit datetime
dt_expr = self.handle_datetime_base_arg(args[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hint if required

@hadia206 hadia206 merged commit ebc0e40 into main Aug 26, 2025
12 checks passed
@hadia206 hadia206 deleted the Hadia/sf branch August 26, 2025 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Snowflake databases & dialect in PyDough

4 participants