Skip to content

Conversation

goodwillpunning
Copy link
Contributor

@goodwillpunning goodwillpunning commented Sep 8, 2025

Changes

What does this PR do?

  • Adds initial implementation for uploading an AI/BI Dashboard to a "local" Databricks workspace that summarizes the a profiler assessment extraction run.

Relevant implementation details

  • Implemented so that other EDW profilers only need to upload a Dashboard template (JSON representation of the AI/BI dashboard).

Caveats/things to watch out for when reviewing:

  • a Databricks hostname and token are needed to perform integration tests

Linked issues

  • N/A

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • added skeleton classes
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@goodwillpunning goodwillpunning requested a review from a team as a code owner September 8, 2025 21:44
@goodwillpunning goodwillpunning added the feat/profiler Issues related to profilers label Sep 8, 2025
Copy link

github-actions bot commented Sep 8, 2025

✅ 27/27 passed, 2 flaky, 1m39s total

Flaky tests:

  • 🤪 test_transpiles_informatica_with_sparksql (10.249s)
  • 🤪 test_transpile_sql_file (12.708s)

Running from acceptance #2236

dashboard_str = dashboard_str.replace("`PROFILER_SCHEMA`", f"`{schema_name}`")

# TODO: check if the dashboard exists and unpublish it if it does
# TODO: create a warehouse ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be taken care during deployment this part assumes that necessary infra is setup already

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a Comment to say why we are deviating from LSQL deployment, so other reviewers don't block this PR.


def __init__(self, ws: WorkspaceClient, current_user: User, is_debug: bool = False):
self._ws = ws
self._current_user = current_user
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need this attribute, you can retrive form workspace client

def __init__(self, ws: WorkspaceClient, current_user: User, is_debug: bool = False):
self._ws = ws
self._current_user = current_user
self._dashboard_location = f"/Workspace/Users/{self._current_user}/Lakebridge/Dashboards"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rely on the blueprint to save the file workspace. It should be inside the .lakebridge folder

"tasks": [
NotebookTask(
notebook_path=f"/Workspace/{databricks_user}/Lakebridge/profiler/load_extracted_tables.py",
base_parameters={
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 ways we can implement this, have the ingestion job as python package and use a wheel task
Or have the notebook upload and then run the jobs.
I prefer option 1.

assessment.run()


@lakebridge.command(is_unauthenticated=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@lakebridge.command(is_unauthenticated=False)
@lakebridge.command()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/profiler Issues related to profilers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants