Skip to content

Fixes to improve starrocks driver support#2449

Closed
phaethon wants to merge 3 commits intodlt-hub:develfrom
phaethon:devel
Closed

Fixes to improve starrocks driver support#2449
phaethon wants to merge 3 commits intodlt-hub:develfrom
phaethon:devel

Conversation

@phaethon
Copy link
Copy Markdown

@phaethon phaethon commented Mar 26, 2025

Description

Starrocks analytical database has multiple table types (duplicate key, primary key, aggregate). Currently, dlt does not support starrocks as a separate destination, but it is partly usable through sqlalchemy.

Using starrocks via sqlalchemy dlt creates only "duplicate key" tables. With write dispositions "append" and "merge" this does not work, as generated query for merge (delete with subquery) in Starrocks is supported only for table type "primary key". This pull request allows creating of "primary key" tables if primary key is set for table in hints and create_primary_keys = true. And this fixes usage of "append" and "merge" write dispositions for incremental loading.

Pull request consists of 2 changes:

  • setting sqlalchemy driver option for "PRIMARY KEY" if there are columns with primary key
  • reordering of columns, as starrocks requires columns, which have primary key to be first in the order

If reordering of columns is a problem for users of other databases, additional if can be added to enable the part of the code only when driver is starrocks. This can, also, be disabled by setting create_primary_keys = false. I have not researched how many other databases might have such a requirement and to what extent it is unique for starrocks.

For a reference: starrocks documentation page explicitely saying that for "duplicate key" tables only simple conditions for delete are allowed. See "DML DELETE" row in the "Data change" table.

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 26, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit bfb08a0
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67fa9bd9cc0d780008481ce0

Copy link
Copy Markdown
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phaethon we can merge those fixes but as a part of a larger refactor of sql alchemy destinations that will allow to set destination capabilities per dialect. in this case

  • starrocks naming convention should be plugged in
  • table adapter should be plugged in that can change the table layout before it is created

@rudolfix rudolfix self-assigned this Apr 1, 2025
@phaethon
Copy link
Copy Markdown
Author

phaethon commented Apr 1, 2025

I am currently working on a larger update, which implements starrocks as a separate destination. It implements starrocks specific loading mechanisms (INSERT FROM FILES, Stream Load), creates naming convention (which adds suffix to reserved keywords). It inherits many of the classes from sqlalchemy destination.

WIP can be seen: https://github.com/phaethon/dlt/tree/starrocks

As I currently see it, this PR should be cancelled giving preference to separate destination.

@phaethon phaethon closed this Apr 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants