Skip to content

Conversation

ghanse
Copy link
Contributor

@ghanse ghanse commented Sep 13, 2025

Changes

This PR introduces a new installation prompt which asks users to provide a workspace path when installing DQX as a tool.

I've also refactored the engine to use WorkspaceClient().get_workspace_id() instead of WorkspaceClient().current_user.me() to support running checks on group-assigned clusters.

Linked issues

Resolves #538, Resolves #545

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

Copy link

github-actions bot commented Sep 13, 2025

✅ 10/10 passed, 1 flaky, 1h2m12s total

Flaky tests:

  • 🤪 test_run_dqx_streaming_demo_native (3.989s)

Running from acceptance #2300

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces custom installation folder support for DQX, allowing users to specify a workspace path during installation instead of relying solely on default user home or global locations. The changes also refactor the workspace verification to use get_workspace_id() for better group-assigned cluster compatibility.

  • Added custom installation folder prompt and handling throughout the installation flow
  • Refactored workspace client verification to support group-assigned clusters
  • Updated configuration loading to support custom installation paths

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/integration/test_installation.py Added integration tests for custom folder installation scenarios
src/databricks/labs/dqx/utils.py Added utility function for creating custom installation instances
src/databricks/labs/dqx/installer/install.py Modified installer to accept and handle custom installation folders
src/databricks/labs/dqx/installer/config_provider.py Updated configuration prompting to display installation folder information
src/databricks/labs/dqx/engine.py Reformatted method calls for better readability
src/databricks/labs/dqx/config_loader.py Enhanced config loading to support custom installation folders
src/databricks/labs/dqx/config.py Added install_folder field to InstallationChecksStorageConfig
src/databricks/labs/dqx/checks_storage.py Updated to pass install_folder parameter to config loading
src/databricks/labs/dqx/base.py Changed workspace verification from current_user.me() to get_workspace_id()
docs/dqx/docs/reference/cli.mdx Updated CLI documentation to mention custom installation folder option
docs/dqx/docs/installation.mdx Enhanced installation documentation with custom folder information
Comments suppressed due to low confidence (1)

tests/integration/test_installation.py:1

  • This line was removed but the variable installation is still used on line 62. The code will fail with an undefined variable error.
import logging

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

"""
try:
ws.workspace.get_status(install_folder)
except NotInstalled:
Copy link
Preview

Copilot AI Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception NotInstalled is imported but the workspace API typically raises NotFound exceptions for missing paths, not NotInstalled. This should likely be except NotFound: instead.

Copilot uses AI. Check for mistakes.

* `DQX_FORCE_INSTALL=global databricks labs install dqx`: will force the installation to be for root only (`/Applications/dqx`)
* `DQX_FORCE_INSTALL=user databricks labs install dqx`: will force the installation to be for user only (`/Users/<user>/.dqx`)
**Custom Workspace Folder:**
If you provide a custom path during installation (e.g., `/Shared/dqx-team` or `/Users/shared-user/dqx-project`), DQX will be installed there.
Copy link
Contributor

@mwojtyczka mwojtyczka Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction should explain how to configure the custom path. I only shows the user and global installation options.

We should also add info that group assigned clusters are not compatible with the default user installation. Please test this manually as well on group assigned cluster.

"""

location: str = "installation" # retrieved from the installation config
run_config_name: str = "default" # to retrieve run config
product_name: str = "dqx"
assume_user: bool = True
install_folder: str | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment would be good with info what is the install folder if not provided

if install_folder:
logger.info(f"DQX will be installed in folder '{install_folder}'")
else:
logger.info("DQX will be installed in the default location (user home or global based on environment)")
Copy link
Contributor

@mwojtyczka mwojtyczka Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could report the exact location as we have access to env vars.

@@ -352,6 +351,35 @@ def test_workflows_deployment_creates_jobs_with_remove_after_tag():
wheels.assert_not_called()


Copy link
Contributor

@mwojtyczka mwojtyczka Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would also be good to add tests for uninstallation and update (overwrite existing installation)

Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update documentation regarding: Testing Applications Using DQX
to update how workspace client can be mocked. It is currently using: ws = MagicMock(spec=WorkspaceClient, **{"current_user.me.return_value": None}) # mock the workspace client
Should be:
ws = MagicMock(spec=WorkspaceClient, **{"get_workspace_id.return_value": 0}) # mock the get_workspace_id method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants