-
Notifications
You must be signed in to change notification settings - Fork 63
Spatial data validations #581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sign all commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces specialized data validation functions for geospatial data types, implementing validation checks for geometry and geography columns.
- Adds
is_valid_geometry
andis_valid_geography
functions for spatial data validation - Implements row-level validation using Databricks-specific SQL functions
- Provides comprehensive test coverage for both validation functions
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/databricks/labs/dqx/geo/check_funcs.py | Implements geospatial validation functions using try_to_geometry and try_to_geography |
tests/integration/test_row_checks_geo.py | Integration tests validating the behavior of geometry and geography check functions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
b8c0af8
to
90cb668
Compare
@mwojtyczka Can we do another review? I think the basics are there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation here: https://github.com/tdikland/dqx/blob/feat/geo/docs/dqx/docs/reference/quality_checks.mdx#row-level-checks-reference
Description of checks functions and examples that use classes and yaml.
Please add integration tests similar to test_apply_checks_all_checks_using_classes
and test_apply_checks_all_row_checks_as_yaml_with_streaming
.
Please add perf tests: https://github.com/tdikland/dqx/blob/feat/geo/tests/perf/test_apply_checks.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Changes
Add specialised data validations for geospatial data.
The following checks are implemented:
Linked issues
Resolves #453
Tests