Skip to content

Commit cfee93b

Browse files
Merge branch 'main' into dependabot/pip/duckdb-gte-1.2.2-and-lt-1.4.0
2 parents d64013a + bc1a518 commit cfee93b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1811
-905
lines changed

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,3 @@ contact_links:
33
- name: General Databricks questions
44
url: https://help.databricks.com/
55
about: Issues related to Databricks and not related to Lakebridge
6-
7-
- name: Lakebridge Documentation
8-
url: https://databrickslabs.github.io/lakebridge/
9-
about: Documentation about Lakebridge
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# See https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/syntax-for-issue-forms
2+
# and https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/syntax-for-githubs-form-schema
3+
name: Documentation update
4+
description: Something new needs to be updated in the Lakebridge documentation
5+
title: "[DOCS]: "
6+
labels: ["needs-triage","documentation"]
7+
type: "task"
8+
9+
body:
10+
- type: checkboxes
11+
attributes:
12+
label: Is there an existing issue for this?
13+
description: Please search to see if an issue already exists for the feature request you're willing to submit
14+
options:
15+
- label: I have searched the existing issues
16+
required: true
17+
- type: textarea
18+
attributes:
19+
label: Problem statement
20+
description: A clear and concise description of what the problem is. Ex. The documentation is unclear how to do [...]
21+
validations:
22+
required: true
23+
- type: textarea
24+
attributes:
25+
label: Additional Context
26+
description: Add any other context, references or screenshots about the feature request here.
27+
validations:
28+
required: false

CHANGELOG.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,110 @@
11
# Version changelog
22

3+
## 0.10.4
4+
5+
* Added Source Tech Override for Analyzer ([#1806](https://github.com/databrickslabs/lakebridge/issues/1806)). The Analyzer command has been enhanced with a `source-tech` flag, allowing users to specify the Source System Technology to analyze directly in the command line call.
6+
* Patch user agent for Infa ([#1807](https://github.com/databrickslabs/lakebridge/issues/1807)). Improved user agent handling for dialects with spaces and added Informatica PC support.
7+
8+
## 0.10.3
9+
10+
# Converter Improvements
11+
## General:
12+
- Updated CLI argument handling for transpile (See [1637](https://github.com/databrickslabs/lakebridge/issues/1637)): The transpile command now has improved argument validation, clearer error handling, and more flexible configuration options.
13+
- Workaround issue loading transpiler configuration with python ≥ 3.12.4 (See [1802](https://github.com/databrickslabs/lakebridge/issues/1802)):
14+
Fixed an issue with transpiler configuration loading on Python 3.12.4+ by updating type hints and removing problematic imports.
15+
16+
### Bladebridge Converter
17+
*Teradata*
18+
- Enhanced handling of the TRUNC function and improved date part translation logic for more accurate Teradata conversions.
19+
20+
*Synapse*
21+
- Fixed datatype conversion issues and removed unnecessary parentheses in DDL statements for Synapse.
22+
- Improved header cleaning, removed unsupported N String literals, and fixed several DDL issues including datatype and null literal handling.
23+
- Merged Synapse and MS SQL config files, fixed code loss and datatype wrapping issues, improved handling of ALTER TABLE and view definitions, and added new datatype mappings and regex patterns.
24+
25+
*MS SQL*
26+
- Fixed datatype conversion issues and removed unnecessary parentheses in DDL statements for MS SQL.
27+
- Fixed issues with object_id handling and resolved transpiler errors with IF conditions in SQL code.
28+
- Unified configuration with Synapse, addressed code loss, improved datatype and view handling, and cleaned up redundant SQL commands.
29+
30+
*Datastage*
31+
- Added support for Datastage functions such as DateFromComponents, ALNUM, and SURROGATEKEYGEN, and enhanced function substitution.
32+
- Fixed expression and filter handling, improved function substitution, and enhanced literal wrapping and SQL expression handling for Datastage to Pyspark conversions.
33+
34+
*Datastage and Informatica PySpark target:*
35+
- Fixed issues with AGGREGATE node handling and improved column/expression wrapping in aggregate nodes.
36+
- Enhanced import handling, removed unnecessary aliases, and improved pre- and post-SQL expression processing for PySpark.
37+
38+
*General / Multi-Dialect*
39+
Fixed issues with generating single output files in nested folders, improving output file handling for XML, Python, and JSON formats.
40+
41+
# Reconcile improvements
42+
- Enabled TSQL Recon (See [1798](https://github.com/databrickslabs/lakebridge/issues/1798)): Added support for TSQL-based reconciliation, allowing TSQL scripts as input and updating the SQL Server adapter and tests for TSQL compatibility.
43+
44+
# Documentation updates
45+
- Banner for Informatica Cloud (See [1797](https://github.com/databrickslabs/lakebridge/issues/1797)): Informatica Cloud is temporarily unsupported; a warning banner and updated docs now advise users to contact Databricks for alternatives while a fix is in progress.
46+
- Documentation for Reconcile Automation (See [1793](https://github.com/databrickslabs/lakebridge/issues/1793)): New documentation and utilities streamline table reconciliation, including example notebooks, validation rules, and a static web interface for Snowflake transformations.
47+
- Update python requirements (See [1766](https://github.com/databrickslabs/lakebridge/issues/1766)): The library now supports Python 3.10 and above, with updated installation instructions and emphasis on Java 11+ requirements.
48+
49+
# General
50+
- Improve diagnostics if the Java check fails prior to installing morpheus (See [1784](https://github.com/databrickslabs/lakebridge/issues/1784)): Enhanced Java version checks provide clearer error messages and better logging if Java is missing or incompatible during installation.
51+
- Updated blueprint dependency, to ensure login URLs are accepted as host (See [1760](https://github.com/databrickslabs/lakebridge/issues/1760)): Blueprint dependency updated to allow login URLs as workspace hosts, resolving previous issues with host profile settings.
52+
53+
## 0.10.2
54+
55+
Analyzer Improvements
56+
- Enabled BODS as a source
57+
58+
Converter Improvements
59+
- Better Handling of Unicode in SQL Files: No more weird characters! Lakebridge now automatically detects and removes Unicode BOMs from SQL files, ensuring your files load cleanly—no matter the encoding. (See [#1733](https://github.com/databrickslabs/lakebridge/issues/1733))
60+
- Cleaner Output Files: Header comments that sometimes caused formatting issues in Python or JSON files are now gone. Your output files will only contain the code you need—no extra comments at the top. (See [#1751](https://github.com/databrickslabs/lakebridge/issues/1751))
61+
- Bug fix: Fixed PyArmor issue affecting Windows installations.
62+
- Morpheus converter:
63+
- Databricks Tuple Support: You can now use multi-column (tuple) comparisons like WHERE (A, B, C) NOT IN (SELECT X, Y, Z...)—improving compatibility with Snowflake and Databricks SQL and making complex queries work as expected.
64+
- TRUNCATE Function Transformation: Added support for converting the TRUNCATE function and new related keywords, expanding the range of SQL statements you can process.
65+
- ALL and ANY Subquery Expressions: The converter now understands and supports ALL and ANY subquery expressions, so you can handle more complex SQL logic with ease.
66+
- Improved Snowflake LET Command: The LET command for Snowflake now works even if you don’t provide an assignment or default value.
67+
- BladeBridge converter:
68+
- Datastage: Improved support for duplicate link names
69+
- Datastage: Fixed filter for EXPRESSION in PySpark target
70+
- Informatica: Fixed output, now writing to flat file for SparkSql. Placing source post sql after the target writes
71+
72+
Documentation Refresh
73+
- Clearer instructions for installation, setup, and requirements. Updated examples and requirements (See [#1738](https://github.com/databrickslabs/lakebridge/issues/1738))
74+
- Updated converters supported dialects matrix for clarity on supported input and outputs (See [#1764](https://github.com/databrickslabs/lakebridge/pull/1764))
75+
- Improved Docs Sidebar Navigation: The documentation sidebar is now smarter and more interactive, making it easier to find what you need quickly. (See [#1754](https://github.com/databrickslabs/lakebridge/issues/1754))
76+
77+
## 0.10.1
78+
79+
**Analyzer Improvements**
80+
81+
- **Debug Mode for Analyzer** ([\#1727](https://github.com/databrickslabs/lakebridge/issues/1727)): Run the Analyzer in debug mode by setting your logging level to DEBUG for more detailed diagnostics.
82+
- **Supported Sources Table** ([\#1709](https://github.com/databrickslabs/lakebridge/issues/1709), [\#1708](https://github.com/databrickslabs/lakebridge/issues/1708)): The docs now clearly list all supported source platforms and dialects, so you can quickly check compatibility.
83+
84+
Converter Improvements
85+
- **Encoding Support** ([\#1719](https://github.com/databrickslabs/lakebridge/issues/1719)): Lakebridge now handles quoted-printable encoding in ETL sources.
86+
- **Java Version Handling** ([\#1730](https://github.com/databrickslabs/lakebridge/issues/1730), [\#1731](https://github.com/databrickslabs/lakebridge/issues/1731)): The system now detects if Java isn’t installed and gives clear error messages. Java version parsing is also improved.
87+
- **Cleaner Output** ([\#1684](https://github.com/databrickslabs/lakebridge/issues/1684)): Transpiled code output no longer includes unnecessary line number comments.
88+
- BladeBridge converter inserts `FIXME` comments in lines of code we couldn't automatically convert
89+
- BladeBridge converter enabled Informatica Cloud migrations
90+
91+
**Installation and Configuration Updates**
92+
93+
- **Smarter Install Process** ([\#1691](https://github.com/databrickslabs/lakebridge/issues/1691)): The installer now avoids errors if you choose not to overwrite existing configurations.
94+
- **Configure Reconcile Patch** ([\#1690](https://github.com/databrickslabs/lakebridge/issues/1690)): Deployment of reconciliation jobs, tables, and dashboards now works as expected, targeting the correct files.
95+
96+
**Logging and Error Reporting**
97+
98+
- **Cleaner Logging** ([\#1704](https://github.com/databrickslabs/lakebridge/issues/1704)): Log messages are less noisy and more consistent, with important info easier to spot.
99+
- **Compact Error Reporting** ([\#1693](https://github.com/databrickslabs/lakebridge/issues/1693)): Errors are grouped and summarized, making it easier to review issues.
100+
- **Severity-based Logging** ([\#1685](https://github.com/databrickslabs/lakebridge/issues/1685)): Diagnostic messages are logged with the right severity (error, warning, info).
101+
102+
**General Documentation and Template Updates**
103+
104+
- **Clearer, Friendlier Docs** ([\#1701](https://github.com/databrickslabs/lakebridge/issues/1701), [\#1688](https://github.com/databrickslabs/lakebridge/issues/1688)): Installation and usage guides are now easier to follow, with new flowcharts, step-by-step instructions, and improved formatting.
105+
- **New Issue Templates** ([\#1721](https://github.com/databrickslabs/lakebridge/issues/1721), [\#1687](https://github.com/databrickslabs/lakebridge/issues/1687), [\#1682](https://github.com/databrickslabs/lakebridge/issues/1682)): Submitting documentation or bug issues is easier with new, interactive templates.
106+
- **Supported Sources and Dialects** ([\#1709](https://github.com/databrickslabs/lakebridge/issues/1709), [\#1708](https://github.com/databrickslabs/lakebridge/issues/1708)): Docs now include clear tables outlining supported platforms and SQL dialects, including experimental dbt repointing.
107+
3108
## 0.10.0
4109
# 🚀 Lakebridge v0.10.0 – The Bridge to Databricks Awaits! 🌉
5110

docs/lakebridge/docs/assessment/analyzer/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,11 @@ Analyzer accepts as an input a folder name containing the files and subfolders t
3939

4040
Below is the detailed explanation on the arguments required for Analyzer.
4141
- `source-directory [Required]` - Absolute folder path containing the legacy artifacts.
42-
- `report-file [Required]` - Name of the report file to produce the information info. Must have .xlsx extension
42+
- `report-file [Required]` - Name of the report file to produce the information info. **IMPORTANT:** Must have .xlsx extension
4343
#### Before starting the execution, Analyzer will prompt for the type of technology it is scanning
4444

4545
## Execution
46-
Execute the below command to initialize the transpile process.
46+
Execute the below command to initialize the analyze process.
4747
```bash
4848
databricks labs lakebridge analyze --source-directory <absolute-path> --report-file <absolute-path>
4949
```

docs/lakebridge/docs/dev/index.mdx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ sidebar_position: 60
66

77
import Admonition from '@theme/Admonition';
88

9-
# Authoring Documentation
10-
119
This document provides guidelines for writing documentation for the Lakebridge project.
1210

1311

docs/lakebridge/docs/reconcile/faq.mdx renamed to docs/lakebridge/docs/faq.mdx

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,28 @@
1-
# FAQ
1+
---
2+
sidebar_position: 8
3+
---
4+
# FAQs
25

3-
## Guidance for Oracle as a source
6+
## [Table of Contents](#table-of-contents)
7+
* [Installation](#installation)
8+
* [Reconcile](#reconcile)
9+
10+
----
11+
12+
## Installation
13+
### <u> Install Databricks CLI on Linux without brew</u>
14+
```shell
15+
#!/usr/bin/env bash
16+
17+
#install dependencies
18+
apt update && apt install -y curl sudo unzip
19+
20+
#install databricks cli
21+
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/v0.242.0/install.sh | sudo sh
22+
```
23+
----
24+
## Reconcile
25+
### <u>Guidance for Oracle as a source</u>
426

527
### Driver
628

@@ -29,7 +51,7 @@ This installation is a necessary step to enable seamless comparison between Orac
2951
required Oracle JDBC functionality is readily available within the Databricks environment.
3052

3153

32-
## Commonly Used Custom Transformations
54+
### <u>Commonly Used Custom Transformations</u>
3355

3456
| source_type | data_type | source_transformation | target_transformation | source_value_example | target_value_example | comments |
3557
|-------------|---------------|--------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|-------------------------|-------------------------|---------------------------------------------------------------------------------------------|

0 commit comments

Comments
 (0)