Skip to content

Conversation

@sundarshankar89
Copy link
Collaborator

Changes

What does this PR do?

  • Introduces new Synapse Profiler scripts for in-depth Azure Synapse assessment as part of Lakebridge resources.
  • Creates a YAML pipeline configuration (pipeline_config.yml) to orchestrate data extraction and metric collection across Synapse environments.

Relevant implementation details

Caveats/things to watch out for when reviewing:

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@sundarshankar89 sundarshankar89 self-assigned this Sep 15, 2025
@sundarshankar89 sundarshankar89 requested a review from a team as a code owner September 15, 2025 07:11
@sundarshankar89 sundarshankar89 added the feat/profiler Issues related to profilers label Sep 15, 2025
@github-actions
Copy link

github-actions bot commented Sep 15, 2025

✅ 39/39 passed, 5 flaky, 2m12s total

Flaky tests:

  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (11.805s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (12.894s)
  • 🤪 test_transpiles_informatica_to_sparksql (13.13s)
  • 🤪 test_transpile_teradata_sql (17.314s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (4.472s)

Running from acceptance #2633

Copy link
Contributor

@m-abulazm m-abulazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a better interface to execute
I am not sure if we want to leave it to the developers to always initiate, the workspace and creds. this will lead to inconsistencies sooner or later.

I also would split extracting the "metrics" and persisting them so we need at least two methods extract and persist

@sundarshankar89
Copy link
Collaborator Author

sundarshankar89 commented Oct 7, 2025

we need a better interface to execute I am not sure if we want to leave it to the developers to always initiate, the workspace and creds. this will lead to inconsistencies sooner or later.

I also would split extracting the "metrics" and persisting them so we need at least two methods extract and persist

I have simplified the usage in the latest push

The primary purpose of this module is to retrieve information from system tables using queries and metrics using APIs, categorised into the following four brackets:

  • workspace_extract.py
  • monitoring_metrics_extract.py
  • dedicated_sqlpool_extract.py
  • serverless_sqlpool_extract.py

I don't believe further separating extract and persist makes sense.

Copy link
Contributor

@m-abulazm m-abulazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link
Contributor

@goodwillpunning goodwillpunning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢

conn = duckdb.connect(db_path)

# Drop existing table if it exists
conn.execute(f"DROP TABLE IF EXISTS {table_name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the profiler is run again the next day, for example, there is no way to accumulate history, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that there are 2 similar functions for inserting profiler data, but radically different behavior. Workspace artifacts and monitoring metrics leverage this function, which does not support an append. However, serverless SQL pools and dedicated SQL pools leverage a function called save_resultset_to_db, which does support and append. Why the difference in behavior for these sets of Synapse objects? Perhaps combine the functions into a single function and use the mode param to dictate the behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this in followup PR. Good catch.

Enhanced version of save_resultset_to_db with predetermined schemas.
"""

# Predetermined schemas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@sundarshankar89 sundarshankar89 added this pull request to the merge queue Oct 14, 2025
Merged via the queue into main with commit 1c0d8df Oct 14, 2025
9 checks passed
@sundarshankar89 sundarshankar89 deleted the feature/synapse_profiler_scripts branch October 14, 2025 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/profiler Issues related to profilers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants