Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 72 additions & 8 deletions docs/lakebridge/docs/reconcile/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,76 @@ Refer to [Reconcile Configuration Guide](reconcile_configuration) for detailed i

> 2. Setup the connection properties

#### Option A: Using Lakebridge credentials mechanism
Reconcile connection properties are configured through a dynamic mapping from connection property to value.
The values can be loaded from databricks, env vars or used directly. It depends on the config in `reconcile.yml`
```yaml
...
creds_or_secret_scope:
vault_type: local
source_creds:
<mappings of connection properties to values>
```
or to use databricks secrets. And the value has to be in the form of `<scope_name>/<secret_key>`
```yaml
...
creds_or_secret_scope:
vault_type: databricks
source_creds:
some_property = <scope_name>/<secret_key>
...
```
The expected connection properties under `source_creds` per data source are:
<Tabs>
<TabItem value="snowflake" label="Snowflake">
```yaml
sfUrl = [local_or_databricks_mapping]
account = [local_or_databricks_mapping]
sfUser = [local_or_databricks_mapping]
sfPassword = [local_or_databricks_mapping]
sfDatabase = [local_or_databricks_mapping]
sfSchema = [local_or_databricks_mapping]
sfWarehouse = [local_or_databricks_mapping]
sfRole = [local_or_databricks_mapping]
pem_private_key = [local_or_databricks_mapping]
pem_private_key_password = [local_or_databricks_mapping]
```

:::note
For Snowflake authentication, either sfPassword or pem_private_key is required.
Priority is given to pem_private_key, and if it is not found, sfPassword will be used.
If neither is available, an exception will be raised.

When using an encrypted pem_private_key, you'll need to provide the pem_private_key_password.
This password is used to decrypt the private key for authentication.
:::
</TabItem>
<TabItem value="oracle" label="Oracle">
```yaml
user = [local_or_databricks_mapping]
password = [local_or_databricks_mapping]
host = [local_or_databricks_mapping]
port = [local_or_databricks_mapping]
database = [local_or_databricks_mapping]
```
</TabItem>
<TabItem value="mssql" label="MS SQL Server (incl. Synapse)">
```yaml
user = [local_or_databricks_mapping]
password = [local_or_databricks_mapping]
host = [local_or_databricks_mapping]
port = [local_or_databricks_mapping]
database = [local_or_databricks_mapping]
encrypt = [local_or_databricks_mapping]
trustServerCertificate = [local_or_databricks_mapping]
```
</TabItem>
</Tabs>
#### Option B: Using secret scopes
:::warning
Deprecated in favor of Lakebridge credentials mechanism
:::

Lakebridge-Reconcile manages connection properties by utilizing secrets stored in the Databricks workspace.
Below is the default secret naming convention for managing connection properties.

Expand Down Expand Up @@ -66,17 +136,11 @@ Below are the connection properties required for each source:
sfSchema = [schema]
sfWarehouse = [warehouse_name]
sfRole = [role_name]
pem_private_key = [pkcs8_pem_private_key]
pem_private_key_password = [pkcs8_pem_private_key]
```

:::note
For Snowflake authentication, either sfPassword or pem_private_key is required.
Priority is given to pem_private_key, and if it is not found, sfPassword will be used.
If neither is available, an exception will be raised.

When using an encrypted pem_private_key, you'll need to provide the pem_private_key_password.
This password is used to decrypt the private key for authentication.
For Snowflake authentication, sfPassword is required. To use pem_private_key,
and optionally pem_private_key_password, please use the Lakebridge credentials mechanism.
:::
</TabItem>
<TabItem value="oracle" label="Oracle">
Expand Down
36 changes: 32 additions & 4 deletions docs/lakebridge/docs/reconcile/recon_notebook.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,13 @@ class ReconcileConfig:
secret_scope: str
database_config: DatabaseConfig
metadata_config: ReconcileMetadataConfig
creds_or_secret_scope: ReconcileCredentialConfig | str | None = None
```
Parameters:

- `data_source`: The data source to be reconciled. Supported values: `snowflake`, `teradata`, `oracle`, `mssql`, `synapse`, `databricks`.
- `report_type`: The type of report to be generated. Available report types are `schema`, `row`, `data` or `all`. For details check [here](./dataflow_example.mdx).
- `secret_scope`: The secret scope name used to store the connection credentials for the source database system.
- `secret_scope`: (Deprecated in favor of `creds_or_secret_scope` and kept for backwards compatibility) The secret scope name used to store the connection credentials for the source database system.
- `database_config`: The database configuration for connecting to the source database. expects a `DatabaseConfig` object.
- `source_schema`: The source schema name.
- `target_catalog`: The target catalog name.
Expand All @@ -104,20 +105,31 @@ class ReconcileMetadataConfig:
```
If not set the default values will be used to store the metadata. The default resources are created during the installation
of Lakebridge.
- `creds_or_secret_scope`: The credentials to use to connect to the data source. Made optional for backwards compatibility.
Can also be a string having value of secret scope to mimic old behavior of credentials. If used, `secret_scope` will be ignored.
- `vault_type`: Can be local to use the values directly, env to load from env variables or databricks to load from databricks secrets.
- `source_creds`: A mapping of reconcile credentials keys to the values that will be resolved depending on vault type.
```python
@dataclass
class ReconcileCredentialConfig:
vault_type: str
source_creds: dict[str, str]
```

An Example of configuring the Reconcile properties:

```python
from databricks.labs.lakebridge.config import (
DatabaseConfig,
ReconcileConfig,
ReconcileMetadataConfig
ReconcileMetadataConfig,
ReconcileCredentialConfig
)

reconcile_config = ReconcileConfig(
data_source = "snowflake",
report_type = "all",
secret_scope = "snowflake-credential",
secret_scope = "NOT_USED",
database_config= DatabaseConfig(source_catalog="source_sf_catalog",
source_schema="source_sf_schema",
target_catalog="target_databricks_catalog",
Expand All @@ -126,9 +138,25 @@ reconcile_config = ReconcileConfig(
metadata_config = ReconcileMetadataConfig(
catalog = "lakebridge_metadata",
schema= "reconcile"
)
),
creds_or_secret_scope=ReconcileCredentialConfig(
vault_type="local",
source_creds={"sfUrl": "[email protected]", "sfUser": "app", "sfPassword": "the P@asswort", "sfRole": "app"}
)
)
```
An Example of using databricks secrets for the source credentials:
```python
reconcile_config = ReconcileConfig(
...,
creds_or_secret_scope=ReconcileCredentialConfig(
vault_type="databricks",
source_creds={"sfUrl": "some_secret_scope/some_key", "sfUser": "another_secret_scope/user_key", "sfPassword": "scope/key", "sfRole": "scope/key"}
)
)

```
All the expected credentials have to be configured.

## Configure Table Properties

Expand Down
2 changes: 1 addition & 1 deletion docs/lakebridge/docs/reconcile/reconcile_automation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ To run the utility, the following parameters must be set:
- `remorph_catalog`: The catalog configured through CLI.
- `remorph_schema`: The schema configured through CLI.
- `remorph_config_table`: The table configs created as a part of the pre-requisites.
- `secret_scope`: The Databricks secret scope for accessing the source system. Refer to the Lakebridge documentation for the specific keys required to be configured as per the source system.
- `secret_scope`: (Deprecated) The Databricks secret scope for accessing the source system. Refer to the Lakebridge documentation for the specific keys required to be configured as per the source system.
- `source_system`: The source system against which reconciliation is performed.
- `table_recon_summary`: The target summary table created as a part of the pre-requisites.

Expand Down
13 changes: 0 additions & 13 deletions src/databricks/labs/lakebridge/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
from databricks.labs.lakebridge.config import TranspileConfig, LSPConfigOptionV1
from databricks.labs.lakebridge.contexts.application import ApplicationContext
from databricks.labs.lakebridge.connections.credential_manager import cred_file
from databricks.labs.lakebridge.helpers.recon_config_utils import ReconConfigPrompts
from databricks.labs.lakebridge.helpers.telemetry_utils import make_alphanum_or_semver
from databricks.labs.lakebridge.install import installer
from databricks.labs.lakebridge.reconcile.runner import ReconcileRunner
Expand Down Expand Up @@ -699,18 +698,6 @@ def generate_lineage(
lineage_generator(engine, source_dialect, input_source, output_folder)


@lakebridge.command
def configure_secrets(*, w: WorkspaceClient) -> None:
"""Setup reconciliation connection profile details as Secrets on Databricks Workspace"""
recon_conf = ReconConfigPrompts(w)

# Prompt for source
source = recon_conf.prompt_source()

logger.info(f"Setting up Scope, Secrets for `{source}` reconciliation")
recon_conf.prompt_and_save_connection_details()


@lakebridge.command
def configure_database_profiler(w: WorkspaceClient) -> None:
"""[Experimental] Installs and runs the Lakebridge Assessment package for database profiling"""
Expand Down
Loading
Loading