Skip to content

Conversation

tdikland
Copy link
Contributor

@tdikland tdikland commented Sep 19, 2025

Changes

Add specialised data validations for geospatial data.

The following checks are implemented:

  • is_longitude
  • is_latitude
  • is_geometry
  • is_geography
  • is_point
  • is_linestring
  • is_polygon
  • is_multipoint
  • is_multilinestring
  • is_multipolygon
  • is_geometrycollection
  • is_ogc_valid
  • is_non_empty_geometry
  • has_x_coordinate_between
  • has_y_coordinate_between

Linked issues

Resolves #453

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

Copy link

All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign all commits

@mwojtyczka mwojtyczka requested a review from Copilot September 22, 2025 15:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces specialized data validation functions for geospatial data types, implementing validation checks for geometry and geography columns.

  • Adds is_valid_geometry and is_valid_geography functions for spatial data validation
  • Implements row-level validation using Databricks-specific SQL functions
  • Provides comprehensive test coverage for both validation functions

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/databricks/labs/dqx/geo/check_funcs.py Implements geospatial validation functions using try_to_geometry and try_to_geography
tests/integration/test_row_checks_geo.py Integration tests validating the behavior of geometry and geography check functions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@tdikland tdikland force-pushed the feat/geo branch 2 times, most recently from b8c0af8 to 90cb668 Compare September 26, 2025 06:28
@tdikland
Copy link
Contributor Author

@mwojtyczka Can we do another review? I think the basics are there.

@tdikland tdikland marked this pull request as ready for review September 27, 2025 09:46
@tdikland tdikland requested a review from a team as a code owner September 27, 2025 09:46
@tdikland tdikland requested review from grusin-db and removed request for a team September 27, 2025 09:46
@tdikland tdikland changed the title [DRAFT] Spatial data validations Spatial data validations Sep 27, 2025
Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation here: https://github.com/tdikland/dqx/blob/feat/geo/docs/dqx/docs/reference/quality_checks.mdx#row-level-checks-reference

Description of checks functions and examples that use classes and yaml.

Please add integration tests similar to test_apply_checks_all_checks_using_classes and test_apply_checks_all_row_checks_as_yaml_with_streaming.

Please add perf tests: https://github.com/tdikland/dqx/blob/feat/geo/tests/perf/test_apply_checks.py

@mwojtyczka mwojtyczka requested a review from Copilot September 30, 2025 10:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@mwojtyczka mwojtyczka requested a review from Copilot October 2, 2025 20:21
Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@mwojtyczka mwojtyczka merged commit d2a37a8 into databrickslabs:main Oct 2, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: Geo-Spatial Data Validation
3 participants