Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure support? #7

Open
bradlindblad opened this issue Jun 22, 2024 · 7 comments
Open

Azure support? #7

bradlindblad opened this issue Jun 22, 2024 · 7 comments

Comments

@bradlindblad
Copy link

Will there be support for databricks in azure at some point?

@gdubya
Copy link

gdubya commented Aug 13, 2024

Azure is (possibly) nearly already supported. I tested using the following script:

INSTALL uc_catalog from core_nightly;
INSTALL delta from core;
LOAD delta;
LOAD uc_catalog;
CREATE SECRET (
	TYPE UC,
	TOKEN '${mytoken}',
	ENDPOINT '${my_db_workspace}'
);

CREATE SECRET (
    TYPE AZURE,
    PROVIDER CREDENTIAL_CHAIN,
    CHAIN 'cli',
    ACCOUNT_NAME 'mystorageaccount'
);

ATTACH 'my_catalog' AS my_catalog (TYPE UC_CATALOG);
SHOW ALL TABLES;

select * from my_catalog.my_schema.test_table1;

but encountered duckdb/duckdb-delta#71 when I select from a table.

@gdubya
Copy link

gdubya commented Sep 15, 2024

This works now. The submodules just need to be bumped to v1.1.0 (CC @samansmink)

@glalanne
Copy link

Trying the extension with Azure Databricks using duckdb version v1.1.1 af39bd0dcf.

I can execute the SQL script from above with no errors, but the SHOW ALL TABLES returns 0 rows.
Any suggestion to troubleshoot this to see where is the issue?

Using the Databricks CLI returns my tables.
Thanks

@gdubya
Copy link

gdubya commented Sep 26, 2024

@glalanne In a few tests i had to repeat the SHOW ALL TABLES command to get the result. I haven't looked into the cause yet, but perhaps #8 is related?

@guanjieshen
Copy link

I can get SHOW ALL TABLES to work, but credential vending support seems to be only for AWS right now? @gdubya in the code you shared above

CREATE SECRET (
    TYPE AZURE,
    PROVIDER CREDENTIAL_CHAIN,
    CHAIN 'cli',
    ACCOUNT_NAME 'mystorageaccount'
);

Would this use the az cli to authenticate to the storage account as opposes to the sas token provided by UC?
`

@gdubya
Copy link

gdubya commented Nov 30, 2024

@guanjieshen correct, the secret is for azure authentication. I haven't looked into credential vending yet, but will in the next few weeks.

@nclaeys
Copy link

nclaeys commented Jan 21, 2025

For me the provided setup works only if you specify the azure_transport_option as follows: SET azure_transport_option_type = 'curl';

It would be really nice if it would support credential vending as that is currently a blocker for me. I am planning to implement it in dbt-duckdb using python as my c++ skills are unexisting. Here are the steps needed in python, maybe it can help you.

DATABRICKS_WORKSPACE_ENDPOINT = ""

class DatabricksCredentials:
  def __init__(self, sas_token, external_url):
    self.sas_token = sas_token
    self.external_url = external_url

  def __str__(self):
    return str(self.__dict__)

  def get_storage_account(self):
    return re.search('@(.*).dfs.', self.external_url).group(1)

def get_databricks_table_id(full_table_name: str, token: AccessToken):
    headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token.token}"}
    response = requests.get(url=f"{DATABRICKS_WORKSPACE_ENDPOINT}/api/2.1/unity-catalog/tables/{full_table_name}", headers=headers)
    if response.status_code == 200:
        json_body = response.json()
        return json_body["table_id"]
    else:
        raise Exception(f"Failed to get databricks credentials due to: {response.json()}")

def get_databricks_credentials(token: AccessToken, table_id: str):
    data = {
        "table_id": table_id,
        "operation": "READ_WRITE",
    }
    headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token.token}"}
    response = requests.post(url=f"{DATABRICKS_WORKSPACE_ENDPOINT}/api/2.1/unity-catalog/temporary-table-credentials", data=json.dumps(data), headers=headers)
    if response.status_code == 200:
        json_body = response.json()
        return DatabricksCredentials(json_body["azure_user_delegation_sas"]["sas_token"], json_body["url"])
    else:
        raise Exception(f"Failed to get databricks credentials due to: {response.json()}")

def extract_table_name(sql: str):
    return re.search("delta_scan\('([^']+)'\)", sql).group(1)

def run_query(sql: str, table_name: str, microsoft_token: AccessToken):
    table_id = get_databricks_table_id(table_name, microsoft_token)
    databricks_creds = get_databricks_credentials(microsoft_token, table_id)
    print(databricks_creds.sas_token)
    construct_query = f"""
        SET azure_transport_option_type = 'curl';
        CREATE OR REPLACE SECRET (
        TYPE AZURE,
        CONNECTION_STRING 'AccountName={databricks_creds.get_storage_account()};SharedAccessSignature={databricks_creds.sas_token}'
        );
    """
    duckdb.sql(construct_query)
    sql_using_external_url = sql.replace(table_name, databricks_creds.external_url)
    duckdb.sql(sql_using_external_url).show()

def query_data(sql: str, microsoft_token: AccessToken) :
    table_name = extract_table_name(sql)
    print(f"translating tableName to path: {table_name}")
    return run_query(sql, table_name, microsoft_token)

def main():
    default_credential = DefaultAzureCredential()
    # Get token for buildin databricks audiance: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/service-prin-aad-token
    microsoft_token = default_credential.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default")
    query_data(f"""SELECT * FROM delta_scan('duckdbtest.default.users_external')""", microsoft_token)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants