Skip to content

Conversation

@sundarshankar89
Copy link
Collaborator

Changes

What does this PR do?

Improves the assessment pipeline to handle SQL queries that return empty resultsets (0 rows) without causing failures or creating empty tables. This change adds validation to check row counts before attempting table operations and provides clear logging for operations.

Relevant implementation details

Caveats/things to watch out for when reviewing:

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (patch/profiler_test_tmp_path@55b3b87). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...databricks/labs/lakebridge/assessments/pipeline.py 0.00% 9 Missing ⚠️
Additional details and impacted files
@@                       Coverage Diff                       @@
##             patch/profiler_test_tmp_path    #2172   +/-   ##
===============================================================
  Coverage                                ?   63.52%           
===============================================================
  Files                                   ?      100           
  Lines                                   ?     8508           
  Branches                                ?      886           
===============================================================
  Hits                                    ?     5405           
  Misses                                  ?     2936           
  Partials                                ?      167           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

✅ 52/52 passed, 6 flaky, 4m28s total

Flaky tests:

  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (22.751s)
  • 🤪 test_transpiles_informatica_to_sparksql (22.537s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (4.284s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (21.108s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (21.306s)
  • 🤪 test_transpile_teradata_sql (6.441s)

Running from acceptance #3156

with duckdb.connect(db_path) as conn:
# TODO: Add support for figuring out data types from SQLALCHEMY result object result.cursor.description is not reliable
schema = ' STRING, '.join(result.columns) + ' STRING'
schema = ', '.join(f"{col} STRING" for col in result.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the source of the column names is, but they should probably be escaped.

A quick glance at the DuckDB docs hints that they don't provide a method for this, but essentially escaping identifiers means something like:

def escape_duckdb_identifier(name: str) -> str:
    return f'"{name.replace('"', '""')}"'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for extracting information schema, do we want to handle this here, I would rather not accept any sql query which has non complaint column names.

Copy link
Collaborator Author

@sundarshankar89 sundarshankar89 Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This escape has a larger footprint, requiring updates for data retrieval and dashboard creation. Can we create a ticket to address this issue everywhere?

Comment on lines +259 to +261
placeholders = ', '.join(['?'] * len(result.columns))
conn.executemany(f"INSERT INTO {step_name} VALUES ({placeholders})", result.rows)
logging.info(f"Successfully inserted {row_count} rows into table '{step_name}'.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there way for us to see this statement in the logging before it executes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can store it pandas dataframe and print them

@sundarshankar89 sundarshankar89 changed the base branch from main to patch/profiler_test_tmp_path December 4, 2025 08:11
@sundarshankar89 sundarshankar89 added stacked PR Should be reviewed, but not merged feat/profiler Issues related to profilers labels Dec 4, 2025
Base automatically changed from patch/profiler_test_tmp_path to main December 12, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/profiler Issues related to profilers stacked PR Should be reviewed, but not merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: SQL Script execution crashes the profiler where empty output returns

3 participants