Skip to content
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
0ffe82e
Adding new demo's under doc site
ravi-databricks Aug 27, 2025
4fa0ff5
updated docs, readme for:
ravi-databricks Aug 27, 2025
62504c2
added dlt sink demo and silver spec table name ui component
dattawalake-db Aug 27, 2025
15961f9
Removed prints from int-tests
ravi-databricks Aug 28, 2025
ab10433
fix lint issue
dattawalake-db Aug 28, 2025
f92c844
fix lint issue
dattawalake-db Aug 28, 2025
051229c
added faq and app images
dattawalake-db Aug 29, 2025
2d0f732
added faq and app images
dattawalake-db Aug 29, 2025
157fbf4
added faq and app images
dattawalake-db Aug 29, 2025
5f1baa1
Merge pull request #212 from dattawalake/Issue_211
ravi-databricks Aug 29, 2025
51e7636
Organized lakehouse app docsite content
ravi-databricks Aug 31, 2025
c44e1ae
Merge pull request #215 from databrickslabs/issue_211
ravi-databricks Aug 31, 2025
7df49f0
Modified lakehouse app faq and setup
ravi-databricks Aug 31, 2025
47924f5
Merge pull request #216 from databrickslabs/issue_211
ravi-databricks Aug 31, 2025
e872aaf
Corrected lakehouse app faq
ravi-databricks Sep 2, 2025
d8b7dfd
Merge pull request #217 from databrickslabs/issue_211
ravi-databricks Sep 2, 2025
c715a49
Updated demo gifs in readme and docs
ravi-databricks Sep 3, 2025
c00efec
Merge pull request #210 from databrickslabs/issue_209
ravi-databricks Sep 3, 2025
c1b619e
Updated lakehouse app readme to match docs site
ravi-databricks Sep 3, 2025
d87900f
Updated readme replacing DLT with Lakeflow Declarative Pipeline
ravi-databricks Sep 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,15 @@ In practice, a single generic pipeline reads the Dataflowspec and uses it to orc
| Custom transformations | Bronze, Silver layer accepts custom functions|
| Data Quality Expecations Support | Bronze, Silver layer |
| Quarantine table support | Bronze layer |
| [apply_changes](https://docs.databricks.com/en/delta-live-tables/python-ref.html#cdc) API support | Bronze, Silver layer |
| [apply_changes_from_snapshot](https://docs.databricks.com/en/delta-live-tables/python-ref.html#change-data-capture-from-database-snapshots-with-python-in-delta-live-tables) API support | Bronze layer|
| [create_auto_cdc_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes) API support | Bronze, Silver layer |
| [create_auto_cdc_from_snapshot_flow](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-apply-changes-from-snapshot) API support | Bronze layer|
| [append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#use-append-flow-to-write-to-a-streaming-table-from-multiple-source-streams) API support | Bronze layer|
| Liquid cluster support | Bronze, Bronze Quarantine, Silver tables|
| [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/) | ```databricks labs dlt-meta onboard```, ```databricks labs dlt-meta deploy``` |
| Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with ```layer=bronze_silver``` option using Direct publishing mode |
| [DLT Sinks](https://docs.databricks.com/aws/en/delta-live-tables/dlt-sinks) |Supported formats:external ```delta table```, ```kafka```.Bronze, Silver layers|
| [create_sink](https://docs.databricks.com/aws/en/dlt-ref/dlt-python-ref-sink) API support |Supported formats:```external delta table , kafka``` Bronze, Silver layers|
| [Databricks Asset Bundles](https://docs.databricks.com/aws/en/dev-tools/bundles/) | Supported
| [DLT-META UI](https://github.com/databrickslabs/dlt-meta/tree/main/lakehouse_app#dlt-meta-lakehouse-app-setup) | Uses Databricks Lakehouse DLT-META App

## Getting Started

Expand Down Expand Up @@ -137,38 +138,37 @@ If you want to run existing demo files please follow these steps before running
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
```
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)


7. Run onboarding command:
```commandline
databricks labs dlt-meta onboard
```
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)


Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
The command will prompt you to provide onboarding details. If you have cloned the dlt-meta repository, you can accept the default values which will use the configuration from the demo folder.
![onboardingDLTMeta_2.gif](docs/static/images/onboardingDLTMeta_2.gif)


- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
Above onboard cli command will:
1. Push code and data to your Databricks workspace
2. Create an onboarding job
3. Display a success message: ```Job created successfully. job_id={job_id}, url=https://{databricks workspace url}/jobs/{job_id}```
4. Job URL will automatically open in your default browser.

### depoly using dlt-meta CLI:

- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command
- Once onboarding jobs is finished deploy Lakeflow Declarative Pipeline using below command
- ```commandline
databricks labs dlt-meta deploy
```
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
- - Bronze DLT

![deployingDLTMeta_bronze.gif](docs/static/images/deployingDLTMeta_bronze.gif)
The command will prompt you to provide pipeline configuration details.

![deployingDLTMeta_bronze_silver.gif](docs/static/images/deployingDLTMeta_bronze_silver.gif)

- Silver DLT
- - ```commandline
databricks labs dlt-meta deploy
```
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps

![deployingDLTMeta_silver.gif](docs/static/images/deployingDLTMeta_silver.gif)
Above deploy cli command will:
1. Deploy Lakeflow Declarative pipeline with dlt-meta configuration like ```layer```, ```group```, ```dataflowSpec table details``` etc to your databricks workspace
2. Display message: ```dlt-meta pipeline={pipeline_id} created and launched with update_id={pipeline_update_id}, url=https://{databricks workspace url}/#joblist/pipelines/{pipeline_id}```
3. Pipline URL will automatically open in your defaul browser.


## More questions
Expand Down
38 changes: 0 additions & 38 deletions docs/content/app/_index.md

This file was deleted.

20 changes: 16 additions & 4 deletions docs/content/demo/Append_FLOW_CF.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,26 @@ This demo will perform following tasks:
databricks auth login --host WORKSPACE_HOST
```

3. ```commandline
3. Install Python package requirements:
```commandline
# Core requirements
pip install "PyYAML>=6.0" setuptools databricks-sdk

# Development requirements
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
```

4. Clone dlt-meta:
```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

4. ```commandline
5. Navigate to project directory:
```commandline
cd dlt-meta
```

5. Set python environment variable into terminal
6. Set python environment variable into terminal
```commandline
dlt_meta_home=$(pwd)
```
Expand All @@ -38,7 +49,8 @@ This demo will perform following tasks:
export PYTHONPATH=$dlt_meta_home
```

6. ```commandline
7. Run the command:
```commandline
python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=dlt_meta_uc
```

Expand Down
22 changes: 17 additions & 5 deletions docs/content/demo/Append_FLOW_EH.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,32 @@ draft: false
databricks auth login --host WORKSPACE_HOST
```

3. ```commandline
3. Install Python package requirements:
```commandline
# Core requirements
pip install "PyYAML>=6.0" setuptools databricks-sdk

# Development requirements
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
```

4. Clone dlt-meta:
```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

4. ```commandline
5. Navigate to project directory:
```commandline
cd dlt-meta
```
5. Set python environment variable into terminal
6. Set python environment variable into terminal
```commandline
dlt_meta_home=$(pwd)
```
```commandline
export PYTHONPATH=$dlt_meta_home
```
6. Eventhub
7. Configure Eventhub
- Needs eventhub instance running
- Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow)
- Create databricks secrets scope for eventhub keys
Expand Down Expand Up @@ -61,7 +72,8 @@ draft: false
- eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds
- eventhub_port: Eventhub port

7. ```commandline
8. Run the command:
```commandline
python demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --uc_catalog_name=dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey
```

Expand Down
20 changes: 16 additions & 4 deletions docs/content/demo/Apply_Changes_From_Snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,33 @@ draft: false
databricks auth login --host WORKSPACE_HOST
```

3. ```commandline
3. Install Python package requirements:
```commandline
# Core requirements
pip install "PyYAML>=6.0" setuptools databricks-sdk

# Development requirements
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
```

4. Clone dlt-meta:
```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

4. ```commandline
5. Navigate to project directory:
```commandline
cd dlt-meta
```
5. Set python environment variable into terminal
6. Set python environment variable into terminal
```commandline
dlt_meta_home=$(pwd)
```
```commandline
export PYTHONPATH=$dlt_meta_home

6. ```commandline
7. Run the command:
```commandline
python demo/launch_acfs_demo.py --uc_catalog_name=<<uc catalog name>>
```
- uc_catalog_name : Unity catalog name
Expand Down
98 changes: 98 additions & 0 deletions docs/content/demo/DAB.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: "DAB Demo"
date: 2024-02-26T14:25:26-04:00
weight: 28
draft: false
---

### DAB Demo

## Overview
This demo showcases how to use Databricks Asset Bundles (DABs) with DLT-Meta:

This demo will perform following steps:
- Create dlt-meta schema's for dataflowspec and bronze/silver layer
- Upload necessary resources to unity catalog volume
- Create DAB files with catalog, schema, file locations populated
- Deploy DAB to databricks workspace
- Run onboarding using DAB commands
- Run Bronze/Silver Pipelines using DAB commands
- Demo examples will showcase fan-out pattern in silver layer
- Demo example will show case custom transformations for bronze/silver layers
- Adding custom columns and metadata to Bronze tables
- Implementing SCD Type 1 to Silver tables
- Applying expectations to filter data in Silver tables

### Steps:
1. Launch Command Prompt

2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
- Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:

```commandline
databricks auth login --host WORKSPACE_HOST
```

3. Install Python package requirements:
```commandline
# Core requirements
pip install "PyYAML>=6.0" setuptools databricks-sdk

# Development requirements
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
```

4. Clone dlt-meta:
```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

5. Navigate to project directory:
```commandline
cd dlt-meta
```

6. Set python environment variable into terminal:
```commandline
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
```

7. Generate DAB resources and set up schemas:
This command will:
- Generate DAB configuration files
- Create DLT-Meta schemas
- Upload necessary files to volumes
```commandline
python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name=<your_catalog_name> --profile=<your_profile>
```
> Note: If you don't specify `--profile`, you'll be prompted for your Databricks workspace URL and access token.

8. Deploy and run the DAB bundle:
- Navigate to the DAB directory:
```commandline
cd demo/dabs
```

- Validate the bundle configuration:
```commandline
databricks bundle validate --profile=<your_profile>
```

- Deploy the bundle to dev environment:
```commandline
databricks bundle deploy --target dev --profile=<your_profile>
```

- Run the onboarding job:
```commandline
databricks bundle run onboard_people -t dev --profile=<your_profile>
```

- Execute the pipelines:
```commandline
databricks bundle run execute_pipelines_people -t dev --profile=<your_profile>
```

![dab_onboarding_job.png](/images/dab_onboarding_job.png)
![dab_dlt_pipelines.png](/images/dab_dlt_pipelines.png)
20 changes: 16 additions & 4 deletions docs/content/demo/DAIS.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,23 +23,35 @@ This demo showcases DLT-META's capabilities of creating Bronze and Silver DLT pi
databricks auth login --host WORKSPACE_HOST
```

3. ```commandline
3. Install Python package requirements:
```commandline
# Core requirements
pip install "PyYAML>=6.0" setuptools databricks-sdk

# Development requirements
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
```

4. Clone dlt-meta:
```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

4. ```commandline
5. Navigate to project directory:
```commandline
cd dlt-meta
```

5. Set python environment variable into terminal
6. Set python environment variable into terminal
```commandline
dlt_meta_home=$(pwd)
```
```commandline
export PYTHONPATH=$dlt_meta_home
```

6. ```commandline
7. Run the command:
```commandline
python demo/launch_dais_demo.py --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=<<>>
```
- uc_catalog_name : unit catalog name
Expand Down
Loading
Loading