-
Notifications
You must be signed in to change notification settings - Fork 80
Handle empty SQL resultsets gracefully in assessment pipeline #2172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## patch/profiler_test_tmp_path #2172 +/- ##
===============================================================
Coverage ? 63.52%
===============================================================
Files ? 100
Lines ? 8508
Branches ? 886
===============================================================
Hits ? 5405
Misses ? 2936
Partials ? 167 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ 52/52 passed, 6 flaky, 4m28s total Flaky tests:
Running from acceptance #3156 |
| with duckdb.connect(db_path) as conn: | ||
| # TODO: Add support for figuring out data types from SQLALCHEMY result object result.cursor.description is not reliable | ||
| schema = ' STRING, '.join(result.columns) + ' STRING' | ||
| schema = ', '.join(f"{col} STRING" for col in result.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the source of the column names is, but they should probably be escaped.
A quick glance at the DuckDB docs hints that they don't provide a method for this, but essentially escaping identifiers means something like:
def escape_duckdb_identifier(name: str) -> str:
return f'"{name.replace('"', '""')}"'There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for extracting information schema, do we want to handle this here, I would rather not accept any sql query which has non complaint column names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This escape has a larger footprint, requiring updates for data retrieval and dashboard creation. Can we create a ticket to address this issue everywhere?
| placeholders = ', '.join(['?'] * len(result.columns)) | ||
| conn.executemany(f"INSERT INTO {step_name} VALUES ({placeholders})", result.rows) | ||
| logging.info(f"Successfully inserted {row_count} rows into table '{step_name}'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there way for us to see this statement in the logging before it executes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can store it pandas dataframe and print them
Changes
What does this PR do?
Improves the assessment pipeline to handle SQL queries that return empty resultsets (0 rows) without causing failures or creating empty tables. This change adds validation to check row counts before attempting table operations and provides clear logging for operations.
Relevant implementation details
Caveats/things to watch out for when reviewing:
Linked issues
Resolves #..
Functionality
databricks labs lakebridge ...Tests