-
Notifications
You must be signed in to change notification settings - Fork 40
feat(rust/sedona): Improve sd_random_geometry() and expose as a Python function #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This reminds me that when I was writing up the pytest-benchmarking code, I realized it would be nice to have a random geometry rust function (instead of a table provider) that returns a single column, so that we could create multiple unique random geometry columns in a single query to use with functions that take two geometry inputs (e.g predicates). I can't think of many other use cases tho, so it doesn't seem very high priority. |
381126b to
7354cbf
Compare
| width = bounds[2] - bounds[0] | ||
| height = bounds[3] - bounds[1] | ||
| if size_min > width or size_min > height: | ||
| raise ValueError("size > height / 2 or width / 2 of bounds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message does not match the check. The error message talks about halfs.
| empty_rate: float = 0.0, | ||
| null_rate: float = 0.0, | ||
| seed: Optional[int] = None, | ||
| ) -> "sedonadb.dataframe.DataFrame": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Missing docstring
| f"Expected bounds as [xmin, ymin, xmax, ymax] but got {bounds}" | ||
| ) | ||
|
|
||
| width = bounds[2] - bounds[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be checks that xmin is smaller than xmax ?
Same for the y bounds below.
|
Thank you for the review! I will try to circle back to this in the next few days. One of the things I discovered here is that our random geometry generator can panic for some parameter combinations, which needed fixing at a lower level 😬 |
7354cbf to
77c084b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the sd_random_geometry() table function by adding input validation, ensuring exact row counts, improving polygon hole generation, and providing a Python wrapper. The improvements make the function more reliable and easier to use across SQL and Python interfaces.
Changes:
- Added comprehensive parameter validation to prevent panics from invalid inputs
- Modified row generation to output exact counts rather than approximate multiples
- Fixed polygon hole generation to avoid self-intersecting geometries by using consistent rotation angles
- Introduced Python wrapper function and shorter parameter names for more compact usage
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona/src/record_batch_reader_provider.rs | Made RowLimitedIterator public and added partition range check |
| rust/sedona/src/random_geometry_provider.rs | Added validation, exact row limiting, shorter parameter names, and non-deterministic default seed |
| rust/sedona-testing/src/datagen.rs | Added validate() method with comprehensive checks and improved error messages |
| rust/sedona-testing/src/benchmark_util.rs | Updated error handling for Uniform distribution creation |
| python/sedonadb/tests/*.py | Replaced SQL-based random geometry generation with Python wrapper calls |
| python/sedonadb/python/sedonadb/functions/table.py | New Python wrapper for sd_random_geometry function |
| python/sedonadb/python/sedonadb/functions/init.py | New Functions accessor class |
| python/sedonadb/python/sedonadb/context.py | Added funcs property to SedonaContext |
| benchmarks/*.py | Updated test parameter names to match new shorter naming convention |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if self.bounds.width() <= 0.0 || self.bounds.height() <= 0.0 { | ||
| return plan_err!("Expected valid bounds but got {:?}", self.bounds); | ||
| } | ||
|
|
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate validation logic for bounds. Lines 527-529 and 531-533 check the exact same condition. Remove one of these duplicate checks.
| if self.bounds.width() <= 0.0 || self.bounds.height() <= 0.0 { | |
| return plan_err!("Expected valid bounds but got {:?}", self.bounds); | |
| } |
This PR improves the table function
sd_random_geometry()in a few ways:SELECT * FROM sd_random_geometry() LIMIT ...and respecting an exact value made it possible to use the Python wrapper more compactly to replace existing SQL text usage.I did as much of possible of this in Rust to ensure that the Python and SQL versions of the function are aligned (and to make it easier to add an R binding later).
This PR also implements a Python wrapper to make it easy to call this function (this was the original intent of this PR!). I also replaced any
SELECT * FROM sd_random_geometry()with the Python version (sd.funcs.table.sd_random_geometry(...)) which made many of the tests much shorter.