diff --git a/README.md b/README.md index bc76b25e..9c5c416b 100755 --- a/README.md +++ b/README.md @@ -29,13 +29,13 @@ In practice, a single generic pipeline reads the Dataflowspec and uses it to orc - Capture [Data Quality Rules](https://github.com/databrickslabs/dlt-meta/tree/main/examples/dqe/customers/bronze_data_quality_expectations.json) - Capture processing logic as sql in [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json) -#### Generic DLT pipeline +#### Generic Lakeflow Declarative Pipeline - Apply appropriate readers based on input metadata - Apply data quality rules with DLT expectations - Apply CDC apply changes if specified in metadata -- Builds DLT graph based on input/output metadata -- Launch DLT pipeline +- Builds Lakeflow Declarative Pipeline graph based on input/output metadata +- Launch Lakeflow Declarative Pipeline pipeline ## High-Level Process Flow: @@ -53,14 +53,15 @@ In practice, a single generic pipeline reads the Dataflowspec and uses it to orc | Custom transformations | Bronze, Silver layer accepts custom functions| | Data Quality Expecations Support | Bronze, Silver layer | | Quarantine table support | Bronze layer | -| [apply_changes](https://docs.databricks.com/en/delta-live-tables/python-ref.html#cdc) API support | Bronze, Silver layer | -| [apply_changes_from_snapshot](https://docs.databricks.com/en/delta-live-tables/python-ref.html#change-data-capture-from-database-snapshots-with-python-in-delta-live-tables) API support | Bronze layer| +| [create_auto_cdc_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes) API support | Bronze, Silver layer | +| [create_auto_cdc_from_snapshot_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes-from-snapshot) API support | Bronze layer| | [append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#use-append-flow-to-write-to-a-streaming-table-from-multiple-source-streams) API support | Bronze layer| | Liquid cluster support | Bronze, Bronze Quarantine, Silver tables| | [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/) | ```databricks labs dlt-meta onboard```, ```databricks labs dlt-meta deploy``` | | Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with ```layer=bronze_silver``` option using Direct publishing mode | -| [DLT Sinks](https://docs.databricks.com/aws/en/delta-live-tables/dlt-sinks) |Supported formats:external ```delta table```, ```kafka```.Bronze, Silver layers| +| [create_sink](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-sink) API support |Supported formats:```external delta table , kafka``` Bronze, Silver layers| | [Databricks Asset Bundles](https://docs.databricks.com/aws/en/dev-tools/bundles/) | Supported +| [DLT-META UI](https://github.com/databrickslabs/dlt-meta/tree/main/lakehouse_app#dlt-meta-lakehouse-app-setup) | Uses Databricks Lakehouse DLT-META App ## Getting Started @@ -137,38 +138,37 @@ If you want to run existing demo files please follow these steps before running dlt_meta_home=$(pwd) export PYTHONPATH=$dlt_meta_home ``` +![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif) + + 7. Run onboarding command: ```commandline databricks labs dlt-meta onboard ``` -![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif) - -Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder. +The command will prompt you to provide onboarding details. If you have cloned the dlt-meta repository, you can accept the default values which will use the configuration from the demo folder. ![onboardingDLTMeta_2.gif](docs/static/images/onboardingDLTMeta_2.gif) - -- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs +Above onboard cli command will: +1. Push code and data to your Databricks workspace +2. Create an onboarding job +3. Display a success message: ```Job created successfully. job_id={job_id}, url=https://{databricks workspace url}/jobs/{job_id}``` +4. Job URL will automatically open in your default browser. ### depoly using dlt-meta CLI: -- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command +- Once onboarding jobs is finished deploy Lakeflow Declarative Pipeline using below command - ```commandline databricks labs dlt-meta deploy ``` -- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps -- - Bronze DLT - -![deployingDLTMeta_bronze.gif](docs/static/images/deployingDLTMeta_bronze.gif) +The command will prompt you to provide pipeline configuration details. +![deployingDLTMeta_bronze_silver.gif](docs/static/images/deployingDLTMeta_bronze_silver.gif) -- Silver DLT -- - ```commandline - databricks labs dlt-meta deploy - ``` -- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps - -![deployingDLTMeta_silver.gif](docs/static/images/deployingDLTMeta_silver.gif) +Above deploy cli command will: +1. Deploy Lakeflow Declarative pipeline with dlt-meta configuration like ```layer```, ```group```, ```dataflowSpec table details``` etc to your databricks workspace +2. Display message: ```dlt-meta pipeline={pipeline_id} created and launched with update_id={pipeline_update_id}, url=https://{databricks workspace url}/#joblist/pipelines/{pipeline_id}``` +3. Pipline URL will automatically open in your defaul browser. ## More questions diff --git a/docs/content/app/_index.md b/docs/content/app/_index.md deleted file mode 100644 index 55349ab4..00000000 --- a/docs/content/app/_index.md +++ /dev/null @@ -1,38 +0,0 @@ -**DLT-META App Setup** - -Make sure you have installed/upgraded the latest databricks cli version e.g. 0.244.0 and workspace access is configured where the app is deploying. - -**Create App and attach source to databricks apps** - -**Step1**. Create Custom app (“empty”) using cli e.g. app name is demo-dltmeta - **databricks apps create demo-dltmeta** - Wait to complete the command execution. It will take a few minutes. -**Step2.** Checkout project from dlt meta git repository -**git clone [https://github.com/databrickslabs/dlt-meta.git](https://github.com/databrickslabs/dlt-meta.git)** -**Step3.** **cd dlt-meta/lakehouse\_app** -**Step4.** Sync DLT-META app code to your workspace directory run below command to sync code “**testapp** is folder name, you can name as per your choise” - **databricks sync . /Workspace/Users/\@databricks.com/testapp** -**Step5.** Deploy code to app created in step1 - **databricks apps deploy demo-dltmeta \--source-code-path /Workspace/Users/\@[databricks.com/testapp](http://databricks.com/testapp)** -Step6. Open the url from step1 log or go to Databricks web page click **New \> app \> click back on App \> search** by your app name and click on url to open the app in browser. - -**Run the App at Local** - -**Step1.** Checkout project from dlt meta git repository -**git clone [https://github.com/databrickslabs/dlt-meta.git](https://github.com/databrickslabs/dlt-meta.git)** -**Step2.** **cd dlt-meta/lakehouse\_app** -**Step3.** **pip install requirements.txt** to deploy dependencies -**Step4.** **databricks configure –host \ –token \** -**Step5.** Run command **python App.py** -**Step6.** Click on url link **http://127.0.0.1:5000** - -**How to Use DLT-META App** - -Databricks apps create user name per app and can be found under databricks app page -![][image1] -This user will be used to onboard, deploy, and run demos for selected UC catalogs and schemas. It requires specific permissions to be added to this user to grant access to those UC catalogs and schemas. For example, the username here is "app-40zbx9\_demo-dltmeta". -**Step1.** After launching the app in the browser click the button “**setup dlt-meta project environment**” that will setup dlt meta environment at app remote instance to process onboarding and deployment activities. -**Step2.** To onboard a dlt pipeline use the **“UI”** tab to onboard and deploy dlt pipelines as per your pipeline configuration. -**Step3.** To run the available demos under “**Demo**” tab - -[image1]: \ No newline at end of file diff --git a/docs/content/demo/Append_FLOW_CF.md b/docs/content/demo/Append_FLOW_CF.md index f90b4012..a77396c9 100644 --- a/docs/content/demo/Append_FLOW_CF.md +++ b/docs/content/demo/Append_FLOW_CF.md @@ -21,15 +21,26 @@ This demo will perform following tasks: databricks auth login --host WORKSPACE_HOST ``` -3. ```commandline +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline git clone https://github.com/databrickslabs/dlt-meta.git ``` -4. ```commandline +5. Navigate to project directory: + ```commandline cd dlt-meta ``` -5. Set python environment variable into terminal +6. Set python environment variable into terminal ```commandline dlt_meta_home=$(pwd) ``` @@ -38,7 +49,8 @@ This demo will perform following tasks: export PYTHONPATH=$dlt_meta_home ``` -6. ```commandline +7. Run the command: + ```commandline python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=dlt_meta_uc ``` diff --git a/docs/content/demo/Append_FLOW_EH.md b/docs/content/demo/Append_FLOW_EH.md index 00ea4712..cbf503ac 100644 --- a/docs/content/demo/Append_FLOW_EH.md +++ b/docs/content/demo/Append_FLOW_EH.md @@ -18,21 +18,32 @@ draft: false databricks auth login --host WORKSPACE_HOST ``` -3. ```commandline +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline git clone https://github.com/databrickslabs/dlt-meta.git ``` -4. ```commandline +5. Navigate to project directory: + ```commandline cd dlt-meta ``` -5. Set python environment variable into terminal +6. Set python environment variable into terminal ```commandline dlt_meta_home=$(pwd) ``` ```commandline export PYTHONPATH=$dlt_meta_home ``` -6. Eventhub +7. Configure Eventhub - Needs eventhub instance running - Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow) - Create databricks secrets scope for eventhub keys @@ -61,7 +72,8 @@ draft: false - eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds - eventhub_port: Eventhub port -7. ```commandline +8. Run the command: + ```commandline python demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --uc_catalog_name=dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey ``` diff --git a/docs/content/demo/Apply_Changes_From_Snapshot.md b/docs/content/demo/Apply_Changes_From_Snapshot.md index 8006ec74..ee294276 100644 --- a/docs/content/demo/Apply_Changes_From_Snapshot.md +++ b/docs/content/demo/Apply_Changes_From_Snapshot.md @@ -26,21 +26,33 @@ draft: false databricks auth login --host WORKSPACE_HOST ``` -3. ```commandline +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline git clone https://github.com/databrickslabs/dlt-meta.git ``` -4. ```commandline +5. Navigate to project directory: + ```commandline cd dlt-meta ``` -5. Set python environment variable into terminal +6. Set python environment variable into terminal ```commandline dlt_meta_home=$(pwd) ``` ```commandline export PYTHONPATH=$dlt_meta_home -6. ```commandline +7. Run the command: + ```commandline python demo/launch_acfs_demo.py --uc_catalog_name=<> ``` - uc_catalog_name : Unity catalog name diff --git a/docs/content/demo/DAB.md b/docs/content/demo/DAB.md new file mode 100644 index 00000000..f50619c7 --- /dev/null +++ b/docs/content/demo/DAB.md @@ -0,0 +1,98 @@ +--- +title: "DAB Demo" +date: 2024-02-26T14:25:26-04:00 +weight: 28 +draft: false +--- + +### DAB Demo + +## Overview +This demo showcases how to use Databricks Asset Bundles (DABs) with DLT-Meta: + +This demo will perform following steps: +- Create dlt-meta schema's for dataflowspec and bronze/silver layer +- Upload necessary resources to unity catalog volume +- Create DAB files with catalog, schema, file locations populated +- Deploy DAB to databricks workspace +- Run onboarding using DAB commands +- Run Bronze/Silver Pipelines using DAB commands +- Demo examples will showcase fan-out pattern in silver layer +- Demo example will show case custom transformations for bronze/silver layers +- Adding custom columns and metadata to Bronze tables +- Implementing SCD Type 1 to Silver tables +- Applying expectations to filter data in Silver tables + +### Steps: +1. Launch Command Prompt + +2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) + - Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace: + + ```commandline + databricks auth login --host WORKSPACE_HOST + ``` + +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + ``` + +5. Navigate to project directory: + ```commandline + cd dlt-meta + ``` + +6. Set python environment variable into terminal: + ```commandline + dlt_meta_home=$(pwd) + export PYTHONPATH=$dlt_meta_home + ``` + +7. Generate DAB resources and set up schemas: + This command will: + - Generate DAB configuration files + - Create DLT-Meta schemas + - Upload necessary files to volumes + ```commandline + python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name= --profile= + ``` + > Note: If you don't specify `--profile`, you'll be prompted for your Databricks workspace URL and access token. + +8. Deploy and run the DAB bundle: + - Navigate to the DAB directory: + ```commandline + cd demo/dabs + ``` + + - Validate the bundle configuration: + ```commandline + databricks bundle validate --profile= + ``` + + - Deploy the bundle to dev environment: + ```commandline + databricks bundle deploy --target dev --profile= + ``` + + - Run the onboarding job: + ```commandline + databricks bundle run onboard_people -t dev --profile= + ``` + + - Execute the pipelines: + ```commandline + databricks bundle run execute_pipelines_people -t dev --profile= + ``` + +![dab_onboarding_job.png](/images/dab_onboarding_job.png) +![dab_dlt_pipelines.png](/images/dab_dlt_pipelines.png) diff --git a/docs/content/demo/DAIS.md b/docs/content/demo/DAIS.md index 4fd468d9..b1f7ec48 100644 --- a/docs/content/demo/DAIS.md +++ b/docs/content/demo/DAIS.md @@ -23,15 +23,26 @@ This demo showcases DLT-META's capabilities of creating Bronze and Silver DLT pi databricks auth login --host WORKSPACE_HOST ``` -3. ```commandline +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline git clone https://github.com/databrickslabs/dlt-meta.git ``` -4. ```commandline +5. Navigate to project directory: + ```commandline cd dlt-meta ``` -5. Set python environment variable into terminal +6. Set python environment variable into terminal ```commandline dlt_meta_home=$(pwd) ``` @@ -39,7 +50,8 @@ This demo showcases DLT-META's capabilities of creating Bronze and Silver DLT pi export PYTHONPATH=$dlt_meta_home ``` -6. ```commandline +7. Run the command: + ```commandline python demo/launch_dais_demo.py --uc_catalog_name=<> --cloud_provider_name=<<>> ``` - uc_catalog_name : unit catalog name diff --git a/docs/content/demo/DLT_Sink.md b/docs/content/demo/DLT_Sink.md new file mode 100644 index 00000000..3e851f37 --- /dev/null +++ b/docs/content/demo/DLT_Sink.md @@ -0,0 +1,74 @@ +--- +title: "Lakeflow Declarative Pipelines Sink Demo" +date: 2024-02-26T14:25:26-04:00 +weight: 27 +draft: false +--- + +### Lakeflow Declarative Pipelines Sink Demo +This demo will perform following steps: +- Showcase onboarding process for dlt writing to external sink pattern +- Run onboarding for the bronze iot events +- Publish test events to kafka topic +- Run Bronze Lakeflow Declarative Pipelines which will read from kafka source topic and write to: + - Events delta table into UC + - Create quarantine table as per data quality expectations + - Writes to external kafka topics + - Writes to external dbfs location as external delta sink + +### Steps: +1. Launch Command Prompt + +2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) + - Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace: + + ```commandline + databricks auth login --host WORKSPACE_HOST + ``` + +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + ``` + +5. Navigate to project directory: + ```commandline + cd dlt-meta + ``` + +6. Set python environment variable into terminal: + ```commandline + dlt_meta_home=$(pwd) + export PYTHONPATH=$dlt_meta_home + ``` + +7. Configure Kafka (Optional): + If you are using secrets for kafka, create databricks secrets scope for source and sink kafka: + ```commandline + databricks secrets create-scope <> + ``` + ```commandline + databricks secrets put-secret --json '{ + "scope": "<>", + "key": "<>", + "string_value": "<>" + }' + ``` + +8. Run the command: + ```commandline + python demo/launch_dlt_sink_demo.py --uc_catalog_name=<> --source=kafka --kafka_source_topic=<> --kafka_sink_topic=<> --kafka_source_servers_secrets_scope_name=<> --kafka_source_servers_secrets_scope_key=<> --kafka_sink_servers_secret_scope_name=<> --kafka_sink_servers_secret_scope_key=<> --profile=<> + ``` + +![dlt_demo_sink.png](/images/dlt_demo_sink.png) +![dlt_delta_sink.png](/images/dlt_delta_sink.png) +![dlt_kafka_sink.png](/images/dlt_kafka_sink.png) diff --git a/docs/content/demo/Silver_Fanout.md b/docs/content/demo/Silver_Fanout.md index 8b57b9cf..6e5cf4eb 100644 --- a/docs/content/demo/Silver_Fanout.md +++ b/docs/content/demo/Silver_Fanout.md @@ -23,31 +23,43 @@ draft: false databricks auth login --host WORKSPACE_HOST ``` -3. ```commandline +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline git clone https://github.com/databrickslabs/dlt-meta.git ``` -4. ```commandline +5. Navigate to project directory: + ```commandline cd dlt-meta ``` -5. Set python environment variable into terminal +6. Set python environment variable into terminal ```commandline dlt_meta_home=$(pwd) ``` ```commandline export PYTHONPATH=$dlt_meta_home -6. ```commandline +7. Run the command: + ```commandline python demo/launch_silver_fanout_demo.py --uc_catalog_name=<> --cloud_provider_name=aws ``` - uc_catalog_name : aws or azure - cloud_provider_name : aws or azure - you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token. - - - 6a. Databricks Workspace URL: - - - Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs. + a. Databricks Workspace URL: + Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs. - - - 6b. Token: + b. Token: - In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down. - On the Access tokens tab, click Generate new token. diff --git a/docs/content/demo/Techsummit.md b/docs/content/demo/Techsummit.md index d53b71da..996ae76e 100644 --- a/docs/content/demo/Techsummit.md +++ b/docs/content/demo/Techsummit.md @@ -17,15 +17,26 @@ This demo will launch auto generated tables(100s) inside single bronze and silve databricks auth login --host WORKSPACE_HOST ``` -3. ```commandline +3. Install Python package requirements: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + ``` + +4. Clone dlt-meta: + ```commandline git clone https://github.com/databrickslabs/dlt-meta.git ``` -4. ```commandline +5. Navigate to project directory: + ```commandline cd dlt-meta ``` -5. Set python environment variable into terminal +6. Set python environment variable into terminal ```commandline dlt_meta_home=$(pwd) ``` @@ -33,7 +44,10 @@ This demo will launch auto generated tables(100s) inside single bronze and silve export PYTHONPATH=$dlt_meta_home ``` -6. Run the command ```python demo/launch_techsummit_demo.py --uc_catalog_name=<> --cloud_provider_name=aws ``` +7. Run the command: + ```commandline + python demo/launch_techsummit_demo.py --uc_catalog_name=<> --cloud_provider_name=aws + ``` - uc_catalog_name : Unity Catalog name - cloud_provider_name : aws or azure - you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token diff --git a/docs/content/demo/_index.md b/docs/content/demo/_index.md index 52cc5891..ba60b748 100644 --- a/docs/content/demo/_index.md +++ b/docs/content/demo/_index.md @@ -10,4 +10,6 @@ draft: false 3. **Append FLOW Autoloader Demo**: Write to same target from multiple sources using append_flow and adding file metadata using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) 4. **Append FLOW Eventhub Demo**: Write to same target from multiple sources using append_flow and adding using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) 5. **Silver Fanout Demo**: This demo will showcase fanout architecture can be implemented in silver layer - 6. **Apply Changes From Snapshot Demo**: This demo will showcase [create_auto_cdc_from_snapshot_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes-from-snapshot) can be implemented inside bronze and silver layer \ No newline at end of file + 6. **Apply Changes From Snapshot Demo**: This demo will showcase [create_auto_cdc_from_snapshot_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes-from-snapshot) can be implemented inside bronze and silver layer + 7. **Lakeflow Declarative Pipelines Sink Demo**: This demo showcases the implementation of write to external sinks like delta and kafka + 8. **DAB Demo**: This demo showcases how to use Databricks Assets Bundles with dlt-meta \ No newline at end of file diff --git a/docs/content/faq/app_faq.md b/docs/content/faq/app_faq.md new file mode 100644 index 00000000..37a4c295 --- /dev/null +++ b/docs/content/faq/app_faq.md @@ -0,0 +1,44 @@ +--- +title: "App" +date: 2021-08-04T14:26:55-04:00 +weight: 63 +draft: false +--- + +### Initial Setup + +**Q1. Do I need to run an initial setup before using the DLT-META App?** + +Yes. Before using the DLT-META App, you must click the Setup button to create the required dlt-meta environment. This initializes the app and enables you to onboard or manage Lakeflow Declarative Pipelines. + +### Features and Capabilities + +**Q2. What are the main features of the DLT-META App?** + +The DLT-META App provides several key capabilities: +- Onboard new Lakeflow Declarative Pipeline through an interactive interface +- Deploy and manage pipelines directly in the app +- Run demo flows to explore example pipelines and usage patterns +- Use the command-line interface (CLI) to automate operations + +### Access and Permissions + +**Q3. Who can access and use the DLT-META App?** + +Only authenticated Databricks workspace users with appropriate permissions can access and use the app: +- You need `CAN_USE` permission to run the app +- You need `CAN_MANAGE` permission to administer it +- The app can be shared within your workspace or account +- Every user must log in with their Databricks account credentials + +### Resource Access + +**Q4. How does catalog and schema access work in the DLT-META App?** + +By default, the app uses a dedicated Service Principal (SP) for all data and resource access: +- The Service Principal needs explicit permissions (`USE CATALOG`, `USE SCHEMA`, `SELECT`) on all Unity Catalog resources +- User abilities depend on the Service Principal's access, regardless of URL +- Optional On-Behalf-Of (OBO) mode uses individual user permissions + + + diff --git a/docs/content/getting_started/app.md b/docs/content/getting_started/app.md new file mode 100644 index 00000000..000953fa --- /dev/null +++ b/docs/content/getting_started/app.md @@ -0,0 +1,116 @@ +--- +title: "DLT-META Lakehouse App" +date: 2025-08-31T14:25:26-04:00 +weight: 9 +draft: false +--- + + +## Prerequisites + +### System Requirements +- Python 3.8.0 or higher +- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) (latest version, e.g., 0.244.0) +- Configured workspace access + +### Initial Setup +1. Authenticate with Databricks: + ```commandline + databricks auth login --host WORKSPACE_HOST + ``` + +2. Setup Python Environment: + ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + cd dlt-meta + python -m venv .venv + source .venv/bin/activate + pip install databricks-sdk + ``` + +## Deployment Options + +### Deploy to Databricks + +1. Create Custom App: + ```commandline + databricks apps create demo-dltmeta + ``` + > Note: Wait for command completion (a few minutes) + +2. Setup App Code: + ```commandline + cd dlt-meta/lakehouse_app + + # Replace testapp with your preferred folder name + databricks sync . /Workspace/Users/@databricks.com/testapp + + # Deploy the app + databricks apps deploy demo-dltmeta --source-code-path /Workspace/Users/@databricks.com/testapp + ``` + +3. Access the App: + - Open URL from step 1 log, or + - Navigate: Databricks Web UI → New → App → Back to App → Search your app name + +### Run Locally + +1. Setup Environment: + ```commandline + cd dlt-meta/lakehouse_app + pip install -r requirements.txt + ``` + +2. Configure Databricks: + ```commandline + databricks configure --host --token + ``` + +3. Start App: + ```commandline + python App.py + ``` + Access at: http://127.0.0.1:5000 + +## Using DLT-META App + +### App User Setup +![App User Example](/images/app_cli.png) + +The app creates a dedicated user account that: +- Handles onboarding, deployment, and demo execution +- Requires specific permissions for UC catalogs and schemas +- Example username format: "app-40zbx9_demo-dltmeta" + +### Getting Started + +1. Initial Setup: + - Launch app in browser + - Click "Setup dlt-meta project environment" + - This initializes the environment for onboarding and deployment + +2. Pipeline Management: + - Use "UI" tab to onboard and deploy pipelines + - Configure pipelines according to your requirements + + **Onboarding Pipeline:** + ![Onboarding UI](/images/app_onboarding.png) + *Pipeline onboarding interface for configuring new data pipelines* + + **Deploying Pipeline:** + ![Deploy UI](/images/app_deploy_pipeline.png) + *Pipeline deployment interface for managing and deploying pipelines* + +3. Demo Access: + - Available demos can be found under "Demo" tab + - Run pre-configured demo pipelines to explore features + + ![App Demo](/images/app_run_demos.png) + *Demo interface showing available example pipelines* + +4. Command Line Interface: + - Access CLI features under the "CLI" tab + - Execute commands directly from the web interface + + ![CLI UI](/images/app_cli.png) + *CLI interface for command-line operations* diff --git a/docs/content/getting_started/dltmeta_cli.md b/docs/content/getting_started/dltmeta_cli.md index 3aae6329..93e3155e 100644 --- a/docs/content/getting_started/dltmeta_cli.md +++ b/docs/content/getting_started/dltmeta_cli.md @@ -5,20 +5,58 @@ weight: 7 draft: false --- -### pre-requisites: -- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) -- Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace: - - ```commandline - databricks auth login --host WORKSPACE_HOST - ``` +### Prerequisites: - Python 3.8.0 + -##### Steps: -1. ``` git clone https://github.com/databrickslabs/dlt-meta.git ``` -2. ``` cd dlt-meta ``` -3. ``` python -m venv .venv ``` -4. ```source .venv/bin/activate ``` -5. ``` pip install databricks-sdk ``` +- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) + +### Steps: +1. Install and authenticate Databricks CLI: + ```commandline + databricks auth login --host WORKSPACE_HOST + ``` + +2. Install dlt-meta via Databricks CLI: + ```commandline + databricks labs install dlt-meta + ``` + +3. Clone dlt-meta repository: + ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + ``` + +4. Navigate to project directory: + ```commandline + cd dlt-meta + ``` + +5. Create Python virtual environment: + ```commandline + python -m venv .venv + ``` + +6. Activate virtual environment: + ```commandline + source .venv/bin/activate + ``` + +7. Install required packages: + ```commandline + # Core requirements + pip install "PyYAML>=6.0" setuptools databricks-sdk + + # Development requirements + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 + + # Integration test requirements + pip install "typer[all]==0.6.1" + ``` + +8. Set environment variables: + ```commandline + dlt_meta_home=$(pwd) + export PYTHONPATH=$dlt_meta_home + ``` ![onboardingDLTMeta.gif](/images/onboardingDLTMeta.gif) @@ -27,32 +65,32 @@ draft: false ```shell databricks labs dlt-meta onboard ``` -- Above command will prompt you to provide onboarding details. -- If you have cloned dlt-meta git repo then accepting defaults will launch config from [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf) folder. -- You can create onboarding files e.g onboarding.json, data quality and silver transformations and put it in conf folder as show in [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf) +- The command will prompt you to provide onboarding details. If you have cloned the dlt-meta repository, you can accept the default values which will use the configuration from the demo folder. ![onboardingDLTMeta_2.gif](/images/onboardingDLTMeta_2.gif) -![onboardingDLTMeta.gif](/images/onboardingDLTMeta.gif) +- Above onboard cli command will: + 1. Push code and data to your Databricks workspace + 2. Create an onboarding job + 3. Display a success message: ```Job created successfully. job_id={job_id}, url=https://{databricks workspace url}/jobs/{job_id}``` + 4. Job URL will automatically open in your default browser. + - Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command -## Dataflow DLT Pipeline: +## DLT-META Lakeflow Declarative Pipeline: -#### Deploy Bronze DLT +#### Deploy ```Bronze``` and ```Silver``` layer into single pipeline ```shell databricks labs dlt-meta deploy ``` -- Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps +- Above command will prompt you to provide pipeline details. Please provide respective details for schema which you provided in above steps -![deployingDLTMeta_bronze.gif](/images/deployingDLTMeta_bronze.gif) +![deployingDLTMeta_bronze_silver.gif](/images/deployingDLTMeta_bronze_silver.gif) -#### Deploy Silver DLT - ```shell - databricks labs dlt-meta deploy -``` -- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps +- Above deploy cli command will: + 1. Deploy Lakeflow Declarative pipeline with dlt-meta configuration like ```layer```, ```group```, ```dataflowSpec table details``` etc to your databricks workspace + 2. Display message: ```dlt-meta pipeline={pipeline_id} created and launched with update_id={pipeline_update_id}, url=https://{databricks workspace url}/#joblist/pipelines/{pipeline_id}``` + 3. Pipline URL will automatically open in your defaul browser. -![deployingDLTMeta_silver.gif](/images/deployingDLTMeta_silver.gif) -- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs diff --git a/docs/content/getting_started/metadatapreperation.md b/docs/content/getting_started/metadatapreperation.md index 1ae1efb0..498507d3 100644 --- a/docs/content/getting_started/metadatapreperation.md +++ b/docs/content/getting_started/metadatapreperation.md @@ -64,8 +64,7 @@ The `onboarding.json` file contains links to [silver_transformations.json](https | silver_transformation_json | Silver table sql transformation json path | | silver_data_quality_expectations_json_{env} | Silver table data quality expectations json file path | silver_append_flows | Silver table append flows json. e.g.`"silver_append_flows":[{"name":"customer_bronze_flow", -| silver_apply_changes_from_snapshot | Silver apply changes from snapshot Json e.g. Mandatory fields: keys=["userId"], scd_type=`1` or `2` optional fields: track_history_column_list=`[col1]`, track_history_except_column_list=`[col2]` | -"create_streaming_table": false,"source_format": "cloudFiles", "source_details": {"source_database": "APP","source_table":"CUSTOMERS", "source_path_dev": "tests/resources/data/customers", "source_schema_path": "tests/resources/schema/customer_schema.ddl"},"reader_options": {"cloudFiles.format": "json","cloudFiles.inferColumnTypes": "true","cloudFiles.rescuedDataColumn": "_rescued_data"},"once": true}]`| +| silver_apply_changes_from_snapshot | Silver apply changes from snapshot Json e.g. Mandatory fields: keys=["userId"], scd_type=`1` or `2` optional fields: track_history_column_list=`[col1]`, track_history_except_column_list=`[col2]`| diff --git a/docs/content/releases/_index.md b/docs/content/releases/_index.md index b41613b4..d868386a 100644 --- a/docs/content/releases/_index.md +++ b/docs/content/releases/_index.md @@ -4,9 +4,27 @@ date: 2021-08-04T14:50:11-04:00 weight: 80 draft: false --- +# v0.0.10 +## Enhancements +- Added apply_changes_from_snapshot support in silver layer [PR](https://github.com/databrickslabs/dlt-meta/pull/187) +- Added UI using databricks lakehouse app for onboarding/deploy commands [PR](https://github.com/databrickslabs/dlt-meta/pull/168) +- Added support for non-Delta as sinks(delta, kafka) [PR](https://github.com/databrickslabs/dlt-meta/pull/157) +- Added quarantine support in silver layer for data quality rules [PR](https://github.com/databrickslabs/dlt-meta/pull/191) +- Added support for table comments, column comments, and cluster_by [PR](https://github.com/databrickslabs/dlt-meta/pull/91) +- Added catalog support for sourceDetails and targetDetails [PR](https://github.com/databrickslabs/dlt-meta/issues/173) +- Added DBDemos for dlt-meta [PR](https://github.com/databrickslabs/dlt-meta/issues/183) +- Added YAML support for onboarding [PR](https://github.com/databrickslabs/dlt-meta/issues/184) +- Fixed issue cluster by not working with bronze append only table [PR](https://github.com/databrickslabs/dlt-meta/issues/197) +- Fixed issue view name containing period when using DPM [PR](https://github.com/databrickslabs/dlt-meta/issues/169) +- Fixed issue CLI onboarding overwrite option always set to True [PR](https://github.com/databrickslabs/dlt-meta/issues/163) +- Fixed issue Silver DLT not creating based on passed database [PR](https://github.com/databrickslabs/dlt-meta/issues/160) +- Fixed issue PyPI download stats display [PR](https://github.com/databrickslabs/dlt-meta/issues/200) +- Fixed issue Silver Data Quality not working [PR](https://github.com/databrickslabs/dlt-meta/issues/156) +- Fixed issue Removed DPM flag check inside dataflowpipeline [PR](https://github.com/databrickslabs/dlt-meta/issues/177) +- Fixed issue Updated dlt-meta demos into Delta Live Tables Notebook github [PR](https://github.com/databrickslabs/dlt-meta/issues/158) # v0.0.9 -## Enhancement +## Enhancements - Added apply_changes_from_snapshot api support in bronze layer: [PR](https://github.com/databrickslabs/dlt-meta/pull/124) - Added dlt append_flow api support for silver layer: [PR](https://github.com/databrickslabs/dlt-meta/pull/63) - Added support for file metadata columns for autoloader: [PR](https://github.com/databrickslabs/dlt-meta/pull/56) @@ -27,7 +45,7 @@ draft: false - Fixed issue Onboarding with multiple partition columns errors out: [PR](https://github.com/databrickslabs/dlt-meta/pull/134) # v0.0.8 -## Enhancement +## Enhancements - Added dlt append_flow api support: [PR](https://github.com/databrickslabs/dlt-meta/pull/58) - Added dlt append_flow api support for silver layer: [PR](https://github.com/databrickslabs/dlt-meta/pull/63) - Added support for file metadata columns for autoloader: [PR](https://github.com/databrickslabs/dlt-meta/pull/56) @@ -45,14 +63,14 @@ draft: false # v0.0.7 -## Enhancement +## Enhancements ### 1. Mismatched Keys: Update read_dlt_delta() with key "source_database" instead of "database" [#33](https://github.com/databrickslabs/dlt-meta/pull/33) ### 2. Create dlt-meta cli documentation #45 - Readme and docs to include above features # v0.0.6 -## Enhancement +## Enhancements ### 1. Migrate to create_streaming_table api from create_streaming_live_table [#37](https://github.com/databrickslabs/dlt-meta/pull/39) #### Updates - Readme and docs to include above features diff --git a/docs/static/images/app_cli.png b/docs/static/images/app_cli.png new file mode 100644 index 00000000..768760a4 Binary files /dev/null and b/docs/static/images/app_cli.png differ diff --git a/docs/static/images/app_deploy_pipeline.png b/docs/static/images/app_deploy_pipeline.png new file mode 100644 index 00000000..aeac5137 Binary files /dev/null and b/docs/static/images/app_deploy_pipeline.png differ diff --git a/docs/static/images/app_onboarding.png b/docs/static/images/app_onboarding.png new file mode 100644 index 00000000..c90a32da Binary files /dev/null and b/docs/static/images/app_onboarding.png differ diff --git a/docs/static/images/app_run_demos.png b/docs/static/images/app_run_demos.png new file mode 100644 index 00000000..43d538d5 Binary files /dev/null and b/docs/static/images/app_run_demos.png differ diff --git a/docs/static/images/deployingDLTMeta_bronze_silver.gif b/docs/static/images/deployingDLTMeta_bronze_silver.gif new file mode 100644 index 00000000..cd60279d Binary files /dev/null and b/docs/static/images/deployingDLTMeta_bronze_silver.gif differ diff --git a/docs/static/images/onboardingDLTMeta.gif b/docs/static/images/onboardingDLTMeta.gif index 50ddc8ba..5ca00cf6 100644 Binary files a/docs/static/images/onboardingDLTMeta.gif and b/docs/static/images/onboardingDLTMeta.gif differ diff --git a/docs/static/images/onboardingDLTMeta_2.gif b/docs/static/images/onboardingDLTMeta_2.gif index 70156bdf..2c95d764 100644 Binary files a/docs/static/images/onboardingDLTMeta_2.gif and b/docs/static/images/onboardingDLTMeta_2.gif differ diff --git a/integration_tests/notebooks/kafka_runners/publish_events.py b/integration_tests/notebooks/kafka_runners/publish_events.py index dec0cb4d..575ce671 100644 --- a/integration_tests/notebooks/kafka_runners/publish_events.py +++ b/integration_tests/notebooks/kafka_runners/publish_events.py @@ -28,7 +28,6 @@ import json kafka_bootstrap_servers = dbutils.secrets.get(f"{kafka_source_servers_secrets_scope_name}", f"{kafka_source_servers_secrets_scope_key}") for char in kafka_bootstrap_servers: - print(char, end = ' ') producer = KafkaProducer( bootstrap_servers=f"{kafka_bootstrap_servers}", value_serializer=lambda v: json.dumps(v).encode("utf-8"), diff --git a/lakehouse_app/README.md b/lakehouse_app/README.md index 92383508..0953e8e9 100644 --- a/lakehouse_app/README.md +++ b/lakehouse_app/README.md @@ -1,84 +1,117 @@ -# DLT-META Lakehouse App Setup - -Make sure you have installed/upgraded the latest Databricks CLI version (e.g., 0.244.0) and configured workspace access where the app is being deployed. - -## Create App and Attach Source to Databricks Apps - -### Step 1: Create a Custom App ("empty") Using the CLI -For example, if the app name is `demo-dltmeta`: -```bash -databricks apps create demo-dltmeta -``` -Wait for the command execution to complete. It will take a few minutes. - -### Step 2: Checkout Project from DLT-META Git Repository -```bash -git clone https://github.com/databrickslabs/dlt-meta.git -``` - -### Step 3: Navigate to the Project Directory -```bash -cd dlt-meta/lakehouse_app -``` - -### Step 4: Sync the DLT-META App Code to Your Workspace Directory -Run the command below to sync the code (replace `testapp` with your desired folder name): -```bash -databricks sync . /Workspace/Users/@databricks.com/testapp -``` - -### Step 5: Deploy Code to the App Created in Step 1 -```bash -databricks apps deploy demo-dltmeta --source-code-path /Workspace/Users/@databricks.com/testapp -``` - -### Step 6: Open the App in the Browser -- Open the URL from the Step 1 log, or -- Go to the Databricks web page, click **New > App**, click back on **App**, search for your app name, and click on the URL to open the app in the browser. - ---- - -## Run the App Locally - -### Step 1: Checkout Project from DLT-META Git Repository -```bash -git clone https://github.com/databrickslabs/dlt-meta.git -``` - -### Step 2: Navigate to the Project Directory -```bash -cd dlt-meta/lakehouse_app -``` - -### Step 3: Install the Required Dependencies -```bash -pip install -r requirements.txt -``` - -### Step 4: Configure Databricks -```bash -databricks configure --host --token -``` - -### Step 5: Run the App -```bash -python App.py -``` - -### Step 6: Access the App -Click on the URL link: [http://127.0.0.1:5000](http://127.0.0.1:5000) - ---- - -## Databricks App Username - -Databricks creates a unique username for each app, which can be found on the Databricks app page. - -### Step 1: Configure the DLT-META Environment -After launching the app in the browser, click the button **"Setup DLT-META Project Environment"** to configure the DLT-META environment on the app's remote instance for onboarding and deployment activities. - -### Step 2: Onboard a DLT Pipeline -Use the **"UI"** tab to onboard and deploy DLT pipelines based on your pipeline configuration. - -### Step 3: Run Available Demos -Navigate to the **"Demo"** tab to run the available demos. +# DLT-META Lakehouse App + +## Prerequisites + +### System Requirements +- Python 3.8.0 or higher +- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) (latest version, e.g., 0.244.0) +- Configured workspace access + +### Initial Setup +1. Authenticate with Databricks: + ```commandline + databricks auth login --host WORKSPACE_HOST + ``` + +2. Setup Python Environment: + ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + cd dlt-meta + python -m venv .venv + source .venv/bin/activate + pip install databricks-sdk + ``` + +## Deployment Options + +### Deploy to Databricks + +1. Create Custom App: + ```commandline + databricks apps create demo-dltmeta + ``` + > Note: Wait for command completion (a few minutes) + +2. Setup App Code: + ```commandline + cd dlt-meta/lakehouse_app + + # Replace testapp with your preferred folder name + databricks sync . /Workspace/Users/@databricks.com/testapp + + # Deploy the app + databricks apps deploy demo-dltmeta --source-code-path /Workspace/Users/@databricks.com/testapp + ``` + +3. Access the App: + - Open URL from step 1 log, or + - Navigate: Databricks Web UI → New → App → Back to App → Search your app name + +### Run Locally + +1. Setup Environment: + ```commandline + cd dlt-meta/lakehouse_app + pip install -r requirements.txt + ``` + +2. Configure Databricks: + ```commandline + databricks configure --host --token + ``` + +3. Start App: + ```commandline + python App.py + ``` + Access at: http://127.0.0.1:5000 + +## Using DLT-META App + +### App User Setup + +![App User Example](../docs/static/images/app_cli.png) + +The app creates a dedicated user account that: +- Handles onboarding, deployment, and demo execution +- Requires specific permissions for UC catalogs and schemas +- Example username format: "app-40zbx9_demo-dltmeta" + +### Getting Started + +1. Initial Setup: + - Launch app in browser + - Click "Setup dlt-meta project environment" + - This initializes the environment for onboarding and deployment + +2. Pipeline Management: + - Use "UI" tab to onboard and deploy pipelines + - Configure pipelines according to your requirements + + **Onboarding Pipeline:** + + ![Onboarding UI](../docs/static/images/app_onboarding.png) + + *Pipeline onboarding interface for configuring new data pipelines* + + **Deploying Pipeline:** + + ![Deploy UI](../docs/static/images/app_deploy_pipeline.png) + + *Pipeline deployment interface for managing and deploying pipelines* + +3. Demo Access: + - Available demos can be found under "Demo" tab + - Run pre-configured demo pipelines to explore features + + ![App Demo](../docs/static/images/app_run_demos.png) + + *Demo interface showing available example pipelines* + +4. Command Line Interface: + - Access CLI features under the "CLI" tab + - Execute commands directly from the web interface + + ![CLI UI](../docs/static/images/app_cli.png) + + *CLI interface for command-line operations* diff --git a/lakehouse_app/app.py b/lakehouse_app/app.py index db2b6989..5089ba14 100644 --- a/lakehouse_app/app.py +++ b/lakehouse_app/app.py @@ -7,7 +7,6 @@ import logging import errno import re -# Use pty to create a pseudo-terminal for better interactive support import pty import select import fcntl @@ -16,7 +15,6 @@ import signal import json -# Configure logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[logging.FileHandler("dlt-meta-app.log"), @@ -227,8 +225,7 @@ def start_command(): if 'PYTHONPATH' not in os.environ or not os.path.isdir(os.environ.get('PYTHONPATH', '')): commands = [ "pip install databricks-cli", - # "git clone https://github.com/databrickslabs/dlt-meta.git", - "git clone https://github.com/dattawalake/dlt-meta.git", + "git clone https://github.com/databrickslabs/dlt-meta.git", f"python -m venv {current_directory}/dlt-meta/.venv", f"export HOME={current_directory}", "cd dlt-meta", @@ -236,6 +233,7 @@ def start_command(): f"export PYTHONPATH={current_directory}/dlt-meta/", "pwd", "pip install databricks-sdk", + "pip install PyYAML", ] print("Start setting up dlt-meta environment") for c in commands: @@ -322,6 +320,7 @@ def handle_onboard_form(): "silver_schema": request.form.get('silver_schema', 'dltmeta_silver_7b4e981029b843c799bf61a0a121b3ca'), "dlt_meta_layer": request.form.get('dlt_meta_layer', '1'), "bronze_table": request.form.get('bronze_table', 'bronze_dataflowspec'), + "silver_table": request.form.get('silver_table', 'silver_dataflowspec'), "overwrite": "1" if request.form.get('overwrite') == "1" else "0", "version": request.form.get('version', 'v1'), "environment": request.form.get('environment', 'prod'), @@ -375,26 +374,67 @@ def handle_deploy_form(): def run_demo(): code_to_run = request.json.get('demo_name', '') print(f"processing demo for :{request.json}") - current_directory = os.environ['PYTHONPATH'] # os.getcwd() + current_directory = os.environ['PYTHONPATH'] demo_dict = {"demo_cloudfiles": "demo/launch_af_cloudfiles_demo.py", "demo_acf": "demo/launch_acfs_demo.py", "demo_silverfanout": "demo/launch_silver_fanout_demo.py", - "demo_dias": "demo/launch_dais_demo.py" + "demo_dias": "demo/launch_dais_demo.py", + "demo_dlt_sink": "demo/launch_dlt_sink_demo.py", + "demo_dabs": "demo/generate_dabs_resources.py" } demo_file = demo_dict.get(code_to_run, None) uc_name = request.json.get('uc_name', '') - result = subprocess.run(f"python {current_directory}/{demo_file} --uc_catalog_name {uc_name} --profile DEFAULT", - shell=True, - capture_output=True, - text=True - ) + + if code_to_run == 'demo_dabs': + + # Step 1: Generate Databricks resources + subprocess.run(f"python {current_directory}/{demo_file} --uc_catalog_name {uc_name} " + f"--source=cloudfiles --profile DEFAULT", + shell=True, + capture_output=True, + text=True + ) + + # Step 2: Change working directory to demo/dabs for all next commands + subprocess.run("databricks bundle validate --profile=DEFAULT", cwd=f"{current_directory}/demo/dabs", + shell=True, + capture_output=True, + text=True) + + # Step 4: Deploy the bundle + subprocess.run("databricks bundle deploy --target dev --profile=DEFAULT", + cwd=f"{current_directory}/demo/dabs", shell=True, + capture_output=True, + text=True) + + # Step 5: Run 'onboard_people' task + rs1 = subprocess.run("databricks bundle run onboard_people -t dev --profile=DEFAULT", + cwd=f"{current_directory}/demo/dabs", shell=True, + capture_output=True, + text=True) + print(f"onboarding completed: {rs1.stdout}") + # Step 6: Run 'execute_pipelines_people' task + result = subprocess.run("databricks bundle run execute_pipelines_people -t dev --profile=DEFAULT", + cwd=f"{current_directory}/demo/dabs", + shell=True, + capture_output=True, + text=True + ) + print(f"execution of pipeline completed: {result.stdout}") + else: + result = subprocess.run(f"python {current_directory}/{demo_file} --uc_catalog_name {uc_name} " + f"--profile DEFAULT", + shell=True, + capture_output=True, + text=True + ) return extract_command_output(result) def extract_command_output(result): stdout = result.stdout job_id_match = re.search(r"job_id=(\d+) | pipeline=(\d+)", stdout) - url_match = re.search(r"url=(https?://[^\s]+)", stdout) + url_match = re.search(r"(https?://[^\s]+)", stdout) job_id = job_id_match.group(1) or job_id_match.group(2) if job_id_match else None job_url = url_match.group(1) if url_match else None diff --git a/lakehouse_app/templates/landingPage.html b/lakehouse_app/templates/landingPage.html index 75e865a4..2f96ca06 100644 --- a/lakehouse_app/templates/landingPage.html +++ b/lakehouse_app/templates/landingPage.html @@ -655,6 +655,11 @@

Step 1 : Onboarding

+
+ + +
@@ -841,6 +846,8 @@

Available Demos

+ +
@@ -956,7 +963,7 @@ const modalContent = `