Skip to content

Conversation

@m-abulazm
Copy link
Contributor

@m-abulazm m-abulazm commented Oct 28, 2025

Changes

What does this PR do?

  • Support databricks credentials
  • Standardize usages of credentials manager

Linked issues

Progresses #1008

Functionality

  • added relevant user documentation
  • modified existing command: databricks labs lakebridge ...

Tests

  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 63.76812% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.25%. Comparing base (0349ea8) to head (4870029).

Files with missing lines Patch % Lines
.../labs/lakebridge/connections/credential_manager.py 78.37% 8 Missing ⚠️
...s/assessments/synapse/dedicated_sqlpool_extract.py 0.00% 3 Missing ⚠️
.../assessments/synapse/monitoring_metrics_extract.py 0.00% 3 Missing ⚠️
.../assessments/synapse/serverless_sqlpool_extract.py 0.00% 3 Missing ⚠️
...resources/assessments/synapse/workspace_extract.py 0.00% 3 Missing ⚠️
...abs/lakebridge/assessments/configure_assessment.py 33.33% 2 Missing ⚠️
src/databricks/labs/lakebridge/config.py 86.66% 1 Missing and 1 partial ⚠️
...databricks/labs/lakebridge/assessments/profiler.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2123      +/-   ##
==========================================
+ Coverage   65.14%   65.25%   +0.11%     
==========================================
  Files         100      100              
  Lines        8503     8540      +37     
  Branches      885      886       +1     
==========================================
+ Hits         5539     5573      +34     
- Misses       2774     2777       +3     
  Partials      190      190              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

✅ 51/51 passed, 11 flaky, 3m41s total

Flaky tests:

  • 🤪 test_validate_invalid_source_tech (144ms)
  • 🤪 test_validate_non_empty_tables (7ms)
  • 🤪 test_validate_table_not_found (1ms)
  • 🤪 test_validate_mixed_checks (313ms)
  • 🤪 test_validate_invalid_schema_path (0s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (19.558s)
  • 🤪 test_transpiles_informatica_to_sparksql (20.417s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (22.714s)
  • 🤪 test_transpile_teradata_sql (22.235s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (7.895s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (10.925s)

Running from acceptance #3131

# Conflicts:
#	src/databricks/labs/lakebridge/reconcile/connectors/jdbc_reader.py
#	src/databricks/labs/lakebridge/reconcile/connectors/oracle.py
@m-abulazm m-abulazm marked this pull request as ready for review November 10, 2025 14:31
@m-abulazm m-abulazm requested a review from a team as a code owner November 10, 2025 14:31
Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've highlighted some style and design issues that I think need to be resolved, but appreciate that this is the start of what is needed for #1008. On the testing side I really like that we've eliminated some monkey-patching during tests. (Some integration tests would be nice.)

One big concern I have is that I don't see where we're using the new provider because we don't pass the WorkspaceClient in anywhere that I can see. Can you elaborate a bit on the situation there?

@dataclass
class ReconcileCredentialConfig:
vault_type: str # supports local, env, databricks creds.
source_creds: dict[str, str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a better name for this: it doesn't hold the credentials… it's more of a vault configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree but renaming this will require changing across three PRs. I would rename it after reviewing all PRs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that's not a compelling argument, even though I appreciate the frustration. The situation is:

  • Once it's merged to main, we need to treat it as live: it could be released in that state without the other PRs. We can't just say "main is not releasable because these other PRs need to be merged": that would hamstring us in terms of when we release, particularly if something urgent pops up.
  • Even if we don't release, end users can install main in advance of a release. This is something we often do when asking them to test a bug-fix or check if something has been fixed in advance of a release being made.
  • Once it's 'live' we have to either live with it or those other PRs will then need to implement migration. That's even more work.

spark=spark,
ws=ws_client,
secret_scope=reconcile_config.secret_scope,
secret_scope=reconcile_config.creds.source_creds["__secret_scope"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this now be using the CredentialManager mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and it is implemented in #2159 to make the reviews more manageable.
And #2157 adds the prompts to configure the creds.

this current PR can go to main first since it is backwards-compatible without the other two.

Comment on lines 66 to 67
except NotFound as e:
raise KeyError(f'Secret does not exist with scope: {scope} and key: {key_only} : {e}') from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is different to the other providers: they just return the key if the secret cannot be found, whereas here we raise an exception instead.

What do you think the providers should do? I think they need to be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not raise an error. the return type should be optional and it is up to the caller how to handle missing secrets.

I did not want to change lots of things in one go so the
the implementation you see here of DatabricksSecretProvider is copied from src/databricks/labs/lakebridge/reconcile/connectors/secrets.py without changing the way it works which led to some inconsistency.

I would address your comment here in a later PR if you dont mind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal technical pr's not end user facing tech debt design flaws and other cascading effects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants