Skip to content

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Sep 16, 2025

This PR improves the table function sd_random_geometry() in a few ways:

  • Parameters are now validated. Before it was easy to make the data generator panic by inputting nonsensical values. These values now error usefully rather than crash the Python/SQL session.
  • The number of rows output by the table function is now exact. Most actual usage of the function was long the lines of SELECT * FROM sd_random_geometry() LIMIT ... and respecting an exact value made it possible to use the Python wrapper more compactly to replace existing SQL text usage.
  • Polygon holes are now generated in such a way that the geometry is always (hopefully) valid. Before we were applying a random rotation to the shell and the hole independently which led to intersecting edges in some cases.
  • Argument names are shorter and can be specified more compactly (e.g., `"size": 4"). In SQL these are typed as JSON and in Python the extra long Rust names were a bit ugly. Maybe the Rust names should be shorter too but that is mostly internal/test usage (and there is autocomplete to help).

I did as much of possible of this in Rust to ensure that the Python and SQL versions of the function are aligned (and to make it easier to add an R binding later).

This PR also implements a Python wrapper to make it easy to call this function (this was the original intent of this PR!). I also replaced any SELECT * FROM sd_random_geometry() with the Python version (sd.funcs.table.sd_random_geometry(...)) which made many of the tests much shorter.

import sedona.db

sd = sedona.db.connect()
sd.funcs.table.sd_random_geometry("Point", 5, seed=398).show()
#> ┌───────┬───────────────────┬─────────────────────────────────────────────┐
#> │   id  ┆        dist       ┆                   geometry                  │
#> │ int32 ┆      float64      ┆                   geometry                  │
#> ╞═══════╪═══════════════════╪═════════════════════════════════════════════╡
#> │     0 ┆ 87.44080142454678 ┆ POINT(26.260051231847005 71.23812535164866) │
#> ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │     1 ┆ 65.64801018718806 ┆ POINT(46.27832897550079 35.056727892322506) │
#> ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │     2 ┆   91.319203207724 ┆ POINT(17.87026960981337 25.00219940022248)  │
#> ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │     3 ┆ 71.84735302805376 ┆ POINT(76.60472252678774 37.79648618764681)  │
#> ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │     4 ┆ 8.731744436070565 ┆ POINT(6.703936773673624 79.32432691523339)  │
#> └───────┴───────────────────┴─────────────────────────────────────────────┘
sd.funcs.table.sd_random_geometry("Geometry", 100, seed=48).to_pandas().plot()
output

@petern48
Copy link
Collaborator

This reminds me that when I was writing up the pytest-benchmarking code, I realized it would be nice to have a random geometry rust function (instead of a table provider) that returns a single column, so that we could create multiple unique random geometry columns in a single query to use with functions that take two geometry inputs (e.g predicates). I can't think of many other use cases tho, so it doesn't seem very high priority.

width = bounds[2] - bounds[0]
height = bounds[3] - bounds[1]
if size_min > width or size_min > height:
raise ValueError("size > height / 2 or width / 2 of bounds")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message does not match the check. The error message talks about halfs.

empty_rate: float = 0.0,
null_rate: float = 0.0,
seed: Optional[int] = None,
) -> "sedonadb.dataframe.DataFrame":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing docstring

f"Expected bounds as [xmin, ymin, xmax, ymax] but got {bounds}"
)

width = bounds[2] - bounds[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be checks that xmin is smaller than xmax ?
Same for the y bounds below.

@paleolimbot
Copy link
Member Author

Thank you for the review! I will try to circle back to this in the next few days. One of the things I discovered here is that our random geometry generator can panic for some parameter combinations, which needed fixing at a lower level 😬

@paleolimbot paleolimbot changed the title feat(python/sedonadb): Expose random_geometry as a Python function feat(python/sedonadb): Make random_geometry safer and expose as a Python function Jan 11, 2026
@paleolimbot paleolimbot changed the title feat(python/sedonadb): Make random_geometry safer and expose as a Python function feat(python/sedonadb): Improve sd_random_geometry() and expose as a Python function Jan 11, 2026
@paleolimbot paleolimbot changed the title feat(python/sedonadb): Improve sd_random_geometry() and expose as a Python function feat(rust/sedona): Improve sd_random_geometry() and expose as a Python function Jan 11, 2026
@paleolimbot paleolimbot requested a review from Copilot January 11, 2026 06:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the sd_random_geometry() table function by adding input validation, ensuring exact row counts, improving polygon hole generation, and providing a Python wrapper. The improvements make the function more reliable and easier to use across SQL and Python interfaces.

Changes:

  • Added comprehensive parameter validation to prevent panics from invalid inputs
  • Modified row generation to output exact counts rather than approximate multiples
  • Fixed polygon hole generation to avoid self-intersecting geometries by using consistent rotation angles
  • Introduced Python wrapper function and shorter parameter names for more compact usage

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
rust/sedona/src/record_batch_reader_provider.rs Made RowLimitedIterator public and added partition range check
rust/sedona/src/random_geometry_provider.rs Added validation, exact row limiting, shorter parameter names, and non-deterministic default seed
rust/sedona-testing/src/datagen.rs Added validate() method with comprehensive checks and improved error messages
rust/sedona-testing/src/benchmark_util.rs Updated error handling for Uniform distribution creation
python/sedonadb/tests/*.py Replaced SQL-based random geometry generation with Python wrapper calls
python/sedonadb/python/sedonadb/functions/table.py New Python wrapper for sd_random_geometry function
python/sedonadb/python/sedonadb/functions/init.py New Functions accessor class
python/sedonadb/python/sedonadb/context.py Added funcs property to SedonaContext
benchmarks/*.py Updated test parameter names to match new shorter naming convention

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +531 to +534
if self.bounds.width() <= 0.0 || self.bounds.height() <= 0.0 {
return plan_err!("Expected valid bounds but got {:?}", self.bounds);
}

Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate validation logic for bounds. Lines 527-529 and 531-533 check the exact same condition. Remove one of these duplicate checks.

Suggested change
if self.bounds.width() <= 0.0 || self.bounds.height() <= 0.0 {
return plan_err!("Expected valid bounds but got {:?}", self.bounds);
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants