Skip to content

Conversation

asnare
Copy link
Contributor

@asnare asnare commented Sep 24, 2025

Changes

What does this PR do?

This PR implements a describe-transpile subcommand that describes the currently installed transpilers and associated dialects and configuration. This is intended for diagnostics and use by the UI. When run normally, it provides output like this:

% databricks labs lakebridge describe-transpile
Transpiler   Installed Version  Plugin Configuration
==========   =================  ====================
Morpheus     0.6.6              /Users/me/.databricks/labs/remorph-transpilers/databricks-morph-plugin/lib/config.yml
Bladebridge  0.1.15             /Users/me/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml

Supported Source Dialects
=========================
 - datastage
 - informatica (desktop edition)
 - informatica cloud
 - mssql
 - netezza
 - oracle
 - snowflake
 - synapse
 - teradata
 - tsql

When the --output=json option is provided to the Databricks CLI, more information is available:

% databricks labs lakebridge describe-transpile --output=json
{
  "available-dialects": [
    "datastage",
    "informatica (desktop edition)",
    "informatica cloud",
    "mssql",
    "netezza",
    "oracle",
    "snowflake",
    "synapse",
    "teradata",
    "tsql"
  ],
  "installed-transpilers": [
    {
      "config-path":"/Users/andrew.snare/.databricks/labs/remorph-transpilers/databricks-morph-plugin/lib/config.yml",
      "name":"Morpheus",
      "supported-dialects": {
        "snowflake": {
          "options": []
        },
        "tsql": {
          "options": []
        }
      },
      "versions": {
        "installed":"0.6.6",
        "latest":null
      }
    },
    {
      "config-path":"/Users/andrew.snare/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml",
      "name":"Bladebridge",
      "supported-dialects": {
        "datastage": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            },
            {
              "choices": [
                "SPARKSQL",
                "PYSPARK"
              ],
              "flag":"target-tech",
              "method":"CHOICE",
              "prompt":"Specify which technology should be generated"
            }
          ]
        },
        "informatica (desktop edition)": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            },
            {
              "choices": [
                "SPARKSQL",
                "PYSPARK"
              ],
              "flag":"target-tech",
              "method":"CHOICE",
              "prompt":"Specify which technology should be generated"
            }
          ]
        },
        "informatica cloud": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            }
          ]
        },
        "mssql": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            }
          ]
        },
        "netezza": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            }
          ]
        },
        "oracle": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            }
          ]
        },
        "synapse": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            }
          ]
        },
        "teradata": {
          "options": [
            {
              "default":"\u003cnone\u003e",
              "flag":"overrides-file",
              "method":"QUESTION",
              "prompt":"Specify the config file to override the default[Bladebridge] config - press \u003center\u003e for none"
            }
          ]
        }
      },
      "versions": {
        "installed":"0.1.15",
        "latest":null
      }
    }
  ]
}

Relevant implementation details

The formatting of the details is handled by a new TranspilerDescription class, so that the output format can be properly controlled. (For compatibility this will need to be controlled tightly, even if the internal details change.)

Currently there is no lookup to figure out the latest version of an installed transpiler, that's out of scope for this PR although it has a position in the JSON output.

Linked issues

Additional tests in the areas modified by this code are implemented in:

Functionality

  • added new CLI command: databricks labs lakebridge describe-transpile

Tests

  • manually tested
  • added unit tests
  • added integration tests

@asnare asnare self-assigned this Sep 24, 2025
@asnare asnare requested a review from a team as a code owner September 24, 2025 12:36
@asnare asnare added the enhancement New feature or request label Sep 24, 2025
@asnare asnare added the feat/cli actions that are visible to the user label Sep 24, 2025
Copy link

github-actions bot commented Sep 24, 2025

✅ 29/29 passed, 2 flaky, 1m17s total

Flaky tests:

  • 🤪 test_transpiles_informatica_with_sparksql (9.669s)
  • 🤪 test_transpile_sql_file (8.479s)

Running from acceptance #2358

Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
nit: I would like to see which source is supported by which transpiler
and Docs update about new comment.

@asnare
Copy link
Contributor Author

asnare commented Sep 25, 2025

LGTM nit: I would like to see which source is supported by which transpiler and Docs update about new comment.

The dialects for each transpiler are listed in the JSON output. A choice I made here was that the normal non-JSON output is a high-level summary, and the JSON view provides the gory details.

Agreed on documentation, that's a big gap and something I need to rectify.

Copy link
Contributor

@m-abulazm m-abulazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just have one comment/question

all_configs = self.all_transpiler_configs()
return frozenset(all_configs.keys())

def installed_transpilers(self) -> Mapping[str, TranspilerInfo]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use this for adding the transpiler version in the telemetry? and it might make sense to cache this @cached_property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use this for obtaining the transpiler version, but it's overkill: for telemetry you only need the version that's in use, not the whole set or other meta-data.

Regarding @cached_property I considered this but decided not to:

  • It's reasonably expensive to calculate, which makes it not very property-like.
  • For all our use-cases it's only called once. (This isn't a property that various components access on a repository instance that is passed around.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually consider caching if something is expensive to calculate otherwise I dont see the benefit.

I get that it is used once but since it is public, it can happen but highly unlikely

Copy link
Contributor

@m-abulazm m-abulazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. also responded to the previous review

all_configs = self.all_transpiler_configs()
return frozenset(all_configs.keys())

def installed_transpilers(self) -> Mapping[str, TranspilerInfo]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually consider caching if something is expensive to calculate otherwise I dont see the benefit.

I get that it is used once but since it is public, it can happen but highly unlikely

@asnare asnare added this pull request to the merge queue Sep 25, 2025
Merged via the queue into main with commit 8ae887a Sep 25, 2025
9 checks passed
@asnare asnare deleted the transpiler-info branch September 25, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feat/cli actions that are visible to the user
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants