Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more authentication options for Databricks data source #2087

Open
ghjklw opened this issue May 22, 2024 · 2 comments
Open

Enable more authentication options for Databricks data source #2087

ghjklw opened this issue May 22, 2024 · 2 comments

Comments

@ghjklw
Copy link

ghjklw commented May 22, 2024

Soda core uses databricks.sql.connect for authentication, which offer many options, as documented:

Unfortunately, the way this is implemented by soda.data_sources.spark_data_source.databricks_connection_function limits it to personal access tokens:

def databricks_connection_function(host: str, http_path: str, token: str, database: str, schema: str, **kwargs):
from databricks import sql
user_agent_entry = f"soda-core-spark/{SODA_CORE_VERSION} (Databricks)"
logging.getLogger("databricks.sql").setLevel(logging.INFO)
connection = sql.connect(
server_hostname=host,
catalog=database,
schema=schema,
http_path=http_path,
access_token=token,
_user_agent_entry=user_agent_entry,
)
return connection

Likewise in SparkDataSource:

connection = connection_function(
username=self.username,
password=self.password,
host=self.host,
port=self.port,
database=self.database,
auth_method=self.auth_method,
kerberos_service_name=self.kerberos_service_name,
driver=self.driver,
token=self.token,
schema=self.schema,
http_path=self.http_path,
organization=self.organization,
cluster=self.cluster,
server_side_parameters=self.server_side_parameters,
configuration=self.configuration,
scheme=self.scheme,
)

A solution could be to extend the signature of databricks_connection_function to match databricks.sql.connect, for example:

def databricks_connection_function(
    host: str,
    http_path: str,
    database: str,
    schema: str,
    auth_type: Literal["databricks-oauth"] | None = None,
    token: str | None = None,
    username: str | None = None,
    password: str | None = None,
    client_id: str | None = None,
    client_secret: str | None = None,
):
  ...

These could then be sent trough to databricks.sql.connect (with the exception of client_id and client_secret which require the creation of a credentials provider if defined).

Adding these options (in particular OAuth) would allow much more secure and robust connection alternatives!

@tools-soda
Copy link

SAS-3512

@benjamin-pirotte
Copy link

benjamin-pirotte commented May 24, 2024

Hi, thank you for creating the ticket! I will add the request to our backlog and prioritize accordingly.
If you have time, feel free to contribute, it would be greatly appreciated! https://github.com/sodadata/soda-core/blob/main/CONTRIBUTING.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants