Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ed53c99
WIP: downloading from s3
john-sanchez31 Nov 7, 2025
a05fab6
testing infrastructure for s3 custom datasets [run ci]
john-sanchez31 Nov 11, 2025
fddceff
flag run custom for custom dataset tests created
john-sanchez31 Nov 12, 2025
26d039c
adding all connectors for custom test
john-sanchez31 Nov 12, 2025
cd894dd
synthea s3 testing, adding bug test [run custom]
john-sanchez31 Nov 12, 2025
7b1f1f0
excluding custom test for [run ci] [run custom]
john-sanchez31 Nov 13, 2025
f31176e
Fix for call expression alias bug with quoted column names
john-sanchez31 Nov 14, 2025
ad86371
quoting the entire alias, adding a test for each function [run all]
john-sanchez31 Nov 14, 2025
277aea7
now dialect flag does not run custom tests
john-sanchez31 Nov 14, 2025
4fc6a2f
testing [run all]
john-sanchez31 Nov 14, 2025
dce5428
adding mark and secrets for custom tests [run all]
john-sanchez31 Nov 14, 2025
1af103d
set env for custom dataset [run all]
john-sanchez31 Nov 14, 2025
2ee3a34
custom ci separated [run all]
john-sanchez31 Nov 17, 2025
5680d1f
secret name fixed [run all]
john-sanchez31 Nov 17, 2025
85b4e58
Merge branch 'John/s3_testing' into John/callexp_alias_patch
john-sanchez31 Nov 17, 2025
c618f81
testing [run all]
john-sanchez31 Nov 17, 2025
78fb6e5
comments addressed [run all]
john-sanchez31 Nov 17, 2025
bd7dd0d
using input_name [run all]
john-sanchez31 Nov 17, 2025
447fcb4
keeping the db files
john-sanchez31 Nov 19, 2025
654b105
custom and s3 datasets separated
john-sanchez31 Nov 19, 2025
cf2aac2
s3 flag created [run s3]
john-sanchez31 Nov 19, 2025
f116a22
conlficts solved [run s3]
john-sanchez31 Nov 19, 2025
10b2e23
testing [run all]
john-sanchez31 Nov 19, 2025
68e2139
fixture added [run all]
john-sanchez31 Nov 19, 2025
47fef7f
testing [run ci]
john-sanchez31 Nov 19, 2025
25e1474
no initialized db fixed [run all]
john-sanchez31 Nov 19, 2025
fa6f4d1
init script added [run all]
john-sanchez31 Nov 19, 2025
cea2b25
Merge branch 'John/s3_testing' into John/callexp_alias_patch
john-sanchez31 Nov 20, 2025
2fc522d
removing special chars [run all]
john-sanchez31 Nov 20, 2025
31521a7
testing sf [run all]
john-sanchez31 Nov 20, 2025
e5bf954
sf masked tests updated [run all]
john-sanchez31 Nov 20, 2025
3bc7a8e
test refsol updated [run all]
john-sanchez31 Nov 20, 2025
fc3e38f
undo underscore allowed [run all]
john-sanchez31 Nov 20, 2025
21ce516
conflicts solved
john-sanchez31 Nov 20, 2025
df077b4
testing [run all]
john-sanchez31 Nov 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions .github/workflows/custom_testing.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
name: Run Custom Tests All dialects

on:
workflow_call:
inputs:
python-versions:
description: "JSON string of Python versions"
type: string
required: true
secrets:
READ_LLM_FIXTURES_ROLE:
required: true
SF_USERNAME:
required: true
SF_PASSWORD:
required: true
SF_ACCOUNT:
required: true
MYSQL_USERNAME:
required: true
MYSQL_PASSWORD:
required: true
POSTGRES_USER:
required: true
POSTGRES_PASSWORD:
required: true

jobs:
custom-tests:
name: Custom Tests (Python ${{ matrix.python-version }})
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
strategy:
matrix:
python-version: ${{ fromJSON(inputs.python-versions) }}

# Define services here to run Docker containers alongside your job
services:
mysql:
image: bodoai1/pydough-mysql-tpch:latest
env:
# Set environment variables for MySQL container
MYSQL_ROOT_PASSWORD: ${{ secrets.MYSQL_PASSWORD }}
MYSQL_DATABASE: tpch
ports:
- 3306:3306

postgres:
image: bodoai1/pydough-postgres-tpch:latest
env:
# Set environment variables for Postgres container
POSTGRES_USER: ${{ secrets.POSTGRES_USER }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_DB: "pydough_test"
ports:
- 5432:5432
env:
# MYSQL env
MYSQL_USERNAME: ${{ secrets.MYSQL_USERNAME }}
MYSQL_PASSWORD: ${{ secrets.MYSQL_PASSWORD }}
MYSQL_DATABASE: tpch
MYSQL_HOST: 127.0.0.1
# POSTGRES env
POSTGRES_USER: ${{ secrets.POSTGRES_USER }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_DB: pydough_test
POSTGRES_HOST: 127.0.0.1
# SNOWFLAKE env
SF_USERNAME: ${{ secrets.SF_USERNAME }}
SF_PASSWORD: ${{ secrets.SF_PASSWORD }}
SF_ACCOUNT: ${{ secrets.SF_ACCOUNT }}

steps:
- name: Configure AWS Credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.READ_LLM_FIXTURES_ROLE }}
aws-region: us-east-2

- uses: actions/checkout@v4

- name: Setup Python ${{ matrix.python-version }}
id: setup-python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "0.4.23"

- name: Create virtual environment
run: uv venv

- name: Install dependencies
run: uv pip install -e ".[boto3, snowflake, mysql, postgres]"

- name: Confirm all connectors are installed
run: uv run python -c "import boto3; import mysql.connector; import snowflake.connector; import psycopg2; print('All connectors installed')"

- name: Run Custom Tests for all dialects
run: uv run pytest -m custom tests/ -rs
31 changes: 28 additions & 3 deletions .github/workflows/pr_testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ on:
type: boolean
required: false
default: false
run-custom:
description: "Run Custom Datasets Tests"
type: boolean
required: false
default: false

# Limit CI to cancel previous runs in the same PR
concurrency:
Expand Down Expand Up @@ -108,8 +113,6 @@ jobs:
# Output to GitHub Actions expected format
echo "matrix=$joined" >> $GITHUB_OUTPUT



run-python-tests:
name: Main Python Tests
needs: [get-msg, get-py-ver-matrix]
Expand Down Expand Up @@ -148,7 +151,7 @@ jobs:
run: uv run ruff check .

- name: Run Tests
run: uv run pytest tests/ -m "not (snowflake or mysql or postgres or sf_masked)" -rs
run: uv run pytest tests/ -m "not (snowflake or mysql or postgres or sf_masked or custom)" -rs

run-defog-daily-update:
name: Run DEFOG Daily Update
Expand Down Expand Up @@ -232,3 +235,25 @@ jobs:
python-versions: ${{ github.event_name == 'workflow_dispatch'
&& needs.get-py-ver-matrix.outputs.matrix
|| '["3.10", "3.11", "3.12"]' }}

run-custom-tests:
name: Custom datasets Tests
needs: [get-msg, get-py-ver-matrix]
if: |
(github.event_name == 'pull_request' && contains(needs.get-msg.outputs.commitMsg, '[run all]')) ||
(github.event_name == 'pull_request' && contains(needs.get-msg.outputs.commitMsg, '[run custom]')) ||
(github.event_name == 'workflow_dispatch' && (inputs.run-all || inputs.run-custom))
uses: ./.github/workflows/custom_testing.yml
secrets:
READ_LLM_FIXTURES_ROLE: ${{ secrets.READ_LLM_FIXTURES_ROLE }}
SF_USERNAME: ${{ secrets.SF_USERNAME }}
SF_PASSWORD: ${{ secrets.SF_PASSWORD }}
SF_ACCOUNT: ${{ secrets.SF_ACCOUNT }}
MYSQL_USERNAME: ${{ secrets.MYSQL_USERNAME }}
MYSQL_PASSWORD: ${{ secrets.MYSQL_PASSWORD }}
POSTGRES_USER: ${{ secrets.POSTGRES_USER }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
with:
python-versions: ${{ github.event_name == 'workflow_dispatch'
&& needs.get-py-ver-matrix.outputs.matrix
|| '["3.10", "3.11", "3.12"]' }}
20 changes: 19 additions & 1 deletion pydough/conversion/column_bubbler.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,25 @@ def generate_cleaner_names(expr: RelationalExpression, current_name: str) -> lis
if len(expr.inputs) == 1:
input_expr = expr.inputs[0]
if isinstance(input_expr, ColumnReference):
result.append(f"{expr.op.function_name.lower()}_{input_expr.name}")
input_name: str = input_expr.name
quoted: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better fix here is to not do any of this special handling, but instead just remove all characters from input_name except letters/numbers/underscores. We don't have to worry about conflicts because the point of this function is that it returns potential alternative names, and it only uses if them if those names are not already in use.

# If the name is quoted, remove the quotes and quote the entire
# generated name later.
if (
input_name.startswith('"')
and input_name.endswith('"')
or input_name.startswith("`")
and input_name.endswith("`")
):
input_name = input_name[1:-1]
quoted = True

cleaner_name: str = f"{expr.op.function_name.lower()}_{input_name}"
if quoted:
cleaner_name = f'"{cleaner_name}"'

result.append(cleaner_name)

if len(expr.inputs) == 0 and expr.op.function_name.lower() == "count":
result.append("n_rows")

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ snowflake = ["snowflake-connector-python[pandas]"]
mysql = ["mysql-connector-python"]
postgres = ["psycopg2-binary"]
server = ["fastapi", "httpx", "uvicorn"]
boto3 = ["boto3"]

[build-system]
requires = ["hatchling", "hatch-vcs"]
Expand Down
1 change: 1 addition & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ markers =
postgres: marks tests that require PostgresSQL credentials
server: marks tests that require api mock server
sf_masked: marks tests that require Snowflake Masked credentials
custom: marks tests that require custom datasets from s3
Loading