Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 1, 2025

📄 87% (0.87x) speedup for polygon_to_bbox in doctr/utils/geometry.py

⏱️ Runtime : 805 microseconds 430 microseconds (best of 412 runs)

📝 Explanation and details

The optimization replaces the original zip(*polygon) approach with a single-pass iterator-based algorithm that eliminates intermediate data structure allocations and reduces function call overhead.

Key optimizations:

  1. Eliminates zip(*polygon) overhead: The original creates two temporary tuples containing all x and y coordinates, which requires memory allocation and unpacking operations. The optimized version processes coordinates directly without intermediate collections.

  2. Single-pass min/max computation: Instead of calling min() and max() functions (which internally iterate through the data), the optimized version computes all four values (min_x, max_x, min_y, max_y) in one iteration using simple comparison operations.

  3. Reduces function call overhead: The original makes 4 function calls (min(x), min(y), max(x), max(y)), while the optimized version uses direct comparisons that are faster than function calls in Python.

Performance characteristics based on test results:

  • Small polygons (1-4 points): 50-115% speedup due to eliminated overhead
  • Medium polygons: 60-95% speedup from avoiding temporary data structures
  • Large polygons (1000+ points): 73-177% speedup where single-pass iteration really shines, especially when coordinates have patterns (increasing/decreasing sequences show highest gains)
  • Edge cases: Maintains identical error handling behavior for empty polygons while still providing 27-32% speedup

The optimization is particularly effective for larger polygons where memory allocation costs and multiple iterations become significant bottlenecks.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 13 Passed
🌀 Generated Regression Tests 49 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
common/test_utils_geometry.py::test_polygon_to_bbox 2.54μs 1.34μs 89.0%✅
🌀 Generated Regression Tests and Runtime
from typing import List, Tuple

# imports
import pytest  # used for our unit tests
from doctr.utils.geometry import polygon_to_bbox

# function to test
# Copyright (C) 2021-2025, Mindee.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.


BoundingBox = Tuple[Tuple[float, float], Tuple[float, float]]
Polygon4P = List[Tuple[float, float]]
from doctr.utils.geometry import polygon_to_bbox

# unit tests

# ----------------------
# BASIC TEST CASES
# ----------------------

def test_square_polygon():
    # Test with a basic square polygon
    polygon = [(0, 0), (0, 2), (2, 2), (2, 0)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.56μs -> 1.33μs (92.0% faster)

def test_rectangle_polygon():
    # Test with a basic rectangle polygon
    polygon = [(1, 1), (1, 4), (5, 4), (5, 1)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.26μs -> 1.33μs (69.6% faster)

def test_polygon_with_negative_coordinates():
    # Test with negative coordinates
    polygon = [(-1, -2), (-3, 4), (5, -6), (7, 8)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.45μs -> 1.51μs (62.2% faster)

def test_polygon_with_floats():
    # Test with float coordinates
    polygon = [(0.5, 1.5), (2.2, 3.3), (4.4, 1.1), (2.2, 0.0)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.39μs -> 1.55μs (54.3% faster)

def test_polygon_with_three_points():
    # Test with a triangle
    polygon = [(2, 3), (5, 7), (1, 9)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.22μs -> 1.26μs (76.1% faster)

# ----------------------
# EDGE TEST CASES
# ----------------------

def test_polygon_single_point():
    # Test with a single point (degenerate polygon)
    polygon = [(4, 5)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 1.68μs -> 867ns (93.5% faster)

def test_polygon_two_points():
    # Test with two points (line segment)
    polygon = [(1, 2), (3, 4)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 1.88μs -> 1.07μs (75.4% faster)

def test_polygon_all_points_same():
    # Test with all points being the same
    polygon = [(7, 7), (7, 7), (7, 7), (7, 7)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.25μs -> 1.15μs (95.2% faster)

def test_polygon_with_colinear_points():
    # Test with all points colinear (vertical line)
    polygon = [(2, 1), (2, 3), (2, 5), (2, 7)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.22μs -> 1.19μs (87.0% faster)

def test_polygon_with_colinear_horizontal_points():
    # Test with all points colinear (horizontal line)
    polygon = [(1, 5), (3, 5), (7, 5), (9, 5)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.13μs -> 1.20μs (77.5% faster)

def test_polygon_with_zero_area():
    # Test with points forming a zero area polygon (all on a line, but not colinear in one direction)
    polygon = [(1, 1), (2, 2), (3, 3), (4, 4)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.06μs -> 1.22μs (69.0% faster)

def test_polygon_with_min_max_swapped_order():
    # Test with points not ordered by min/max
    polygon = [(5, 5), (1, 9), (9, 1), (3, 7)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.10μs -> 1.27μs (64.7% faster)

def test_polygon_with_duplicate_points():
    # Test with duplicate points mixed in
    polygon = [(1, 2), (3, 4), (1, 2), (3, 4)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.04μs -> 1.24μs (64.9% faster)

def test_polygon_with_large_and_small_values():
    # Test with a mix of very large and very small values
    polygon = [(1e-10, 1e10), (1e10, 1e-10), (-1e10, -1e10), (0, 0)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.73μs -> 1.84μs (48.3% faster)

def test_polygon_with_inf_and_nan():
    # Test with inf and nan values
    import math
    polygon = [(1, 2), (math.inf, 3), (4, math.nan), (5, 6)]
    # Should raise ValueError due to nan
    with pytest.raises(ValueError):
        x, y = zip(*polygon)
        # Check for nan in coordinates
        if any(math.isnan(coord) for coord in x + y):
            raise ValueError("Polygon contains NaN coordinate")
        polygon_to_bbox(polygon)

def test_polygon_with_inf_only():
    # Test with inf values only
    import math
    polygon = [(math.inf, 1), (2, math.inf), (3, 4)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.64μs -> 1.80μs (47.0% faster)

def test_polygon_with_empty_list():
    # Test with empty polygon (should raise an error)
    polygon = []
    with pytest.raises(ValueError):
        polygon_to_bbox(polygon) # 3.20μs -> 2.52μs (27.2% faster)

def test_polygon_with_non_tuple_points():
    # Test with points not being tuples (should raise an error)
    polygon = [[1, 2], [3, 4], [5, 6], [7, 8]]  # lists instead of tuples
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 2.50μs -> 1.55μs (60.8% faster)

def test_polygon_with_non_numeric_values():
    # Test with a non-numeric value in the polygon (should raise an error)
    polygon = [(1, 2), (3, "foo"), (5, 6), (7, 8)]
    with pytest.raises(TypeError):
        polygon_to_bbox(polygon) # 3.22μs -> 2.34μs (37.9% faster)

# ----------------------
# LARGE SCALE TEST CASES
# ----------------------

def test_large_polygon_1000_points():
    # Test with a polygon of 1000 points forming a diagonal line
    polygon = [(i, i * 2) for i in range(1000)]
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 70.1μs -> 35.1μs (99.5% faster)

def test_large_polygon_randomized():
    # Test with 1000 random points
    import random
    random.seed(42)  # deterministic!
    xs = [random.uniform(-1000, 1000) for _ in range(1000)]
    ys = [random.uniform(-2000, 2000) for _ in range(1000)]
    polygon = list(zip(xs, ys))
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 63.5μs -> 29.6μs (115% faster)

def test_large_polygon_all_same_point():
    # Test with 1000 points all the same
    polygon = [(5.5, 7.7)] * 1000
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 63.1μs -> 32.2μs (95.6% faster)

def test_large_polygon_extreme_values():
    # Test with 1000 points, half at -1e6, half at 1e6
    polygon = [(-1e6, -1e6)] * 500 + [(1e6, 1e6)] * 500
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 63.7μs -> 47.5μs (34.1% faster)

def test_large_polygon_performance():
    # Test that function runs quickly for large input
    import time
    polygon = [(i, i * 2) for i in range(1000)]
    start = time.time()
    codeflash_output = polygon_to_bbox(polygon); bbox = codeflash_output # 63.6μs -> 36.6μs (73.9% faster)
    elapsed = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import List, Tuple

# imports
import pytest  # used for our unit tests
from doctr.utils.geometry import polygon_to_bbox

# function to test
# Copyright (C) 2021-2025, Mindee.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.


BoundingBox = Tuple[Tuple[float, float], Tuple[float, float]]
Polygon4P = List[Tuple[float, float]]
from doctr.utils.geometry import polygon_to_bbox

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_square_polygon():
    # A simple square
    polygon = [(1, 1), (1, 3), (3, 3), (3, 1)]
    expected_bbox = ((1, 1), (3, 3))
    codeflash_output = polygon_to_bbox(polygon) # 2.35μs -> 1.31μs (79.6% faster)

def test_rectangle_polygon():
    # A rectangle not aligned to axes
    polygon = [(2, 5), (2, 8), (6, 8), (6, 5)]
    expected_bbox = ((2, 5), (6, 8))
    codeflash_output = polygon_to_bbox(polygon) # 2.27μs -> 1.31μs (73.5% faster)

def test_triangle_polygon():
    # A triangle
    polygon = [(0, 0), (2, 2), (4, 0)]
    expected_bbox = ((0, 0), (4, 2))
    codeflash_output = polygon_to_bbox(polygon) # 2.08μs -> 1.20μs (74.1% faster)

def test_polygon_with_negative_coordinates():
    # Polygon with negative coordinates
    polygon = [(-2, -3), (0, -1), (-1, 2), (-3, 1)]
    expected_bbox = ((-3, -3), (0, 2))
    codeflash_output = polygon_to_bbox(polygon) # 2.19μs -> 1.32μs (65.5% faster)

def test_polygon_with_float_coordinates():
    # Polygon with float coordinates
    polygon = [(1.2, 3.4), (2.5, 5.6), (4.7, 2.8), (3.1, 4.2)]
    expected_bbox = ((1.2, 2.8), (4.7, 5.6))
    codeflash_output = polygon_to_bbox(polygon) # 2.49μs -> 1.52μs (63.4% faster)

def test_polygon_order_independence():
    # Polygon with points in random order
    polygon = [(5, 2), (1, 8), (3, 4), (7, 6)]
    expected_bbox = ((1, 2), (7, 8))
    codeflash_output = polygon_to_bbox(polygon) # 2.21μs -> 1.33μs (66.3% faster)

# ------------------------
# Edge Test Cases
# ------------------------

def test_single_point_polygon():
    # Polygon with a single point
    polygon = [(2, 2)]
    expected_bbox = ((2, 2), (2, 2))
    codeflash_output = polygon_to_bbox(polygon) # 1.78μs -> 830ns (115% faster)

def test_two_point_polygon():
    # Polygon with two points (degenerate case)
    polygon = [(1, 5), (4, 2)]
    expected_bbox = ((1, 2), (4, 5))
    codeflash_output = polygon_to_bbox(polygon) # 1.97μs -> 1.06μs (85.8% faster)

def test_all_points_same():
    # All points are the same
    polygon = [(3, 3), (3, 3), (3, 3), (3, 3)]
    expected_bbox = ((3, 3), (3, 3))
    codeflash_output = polygon_to_bbox(polygon) # 2.23μs -> 1.14μs (96.2% faster)

def test_colinear_points_horizontal():
    # All points are colinear horizontally
    polygon = [(1, 2), (3, 2), (5, 2), (7, 2)]
    expected_bbox = ((1, 2), (7, 2))
    codeflash_output = polygon_to_bbox(polygon) # 2.21μs -> 1.22μs (81.1% faster)

def test_colinear_points_vertical():
    # All points are colinear vertically
    polygon = [(4, 1), (4, 3), (4, 5), (4, 7)]
    expected_bbox = ((4, 1), (4, 7))
    codeflash_output = polygon_to_bbox(polygon) # 2.14μs -> 1.24μs (73.1% faster)

def test_polygon_with_zero_coordinates():
    # Polygon with zero coordinates
    polygon = [(0, 0), (0, 5), (5, 0), (5, 5)]
    expected_bbox = ((0, 0), (5, 5))
    codeflash_output = polygon_to_bbox(polygon) # 2.15μs -> 1.25μs (71.1% faster)

def test_polygon_with_large_and_small_values():
    # Polygon with very large and very small values
    polygon = [(-1e9, 1e9), (1e9, -1e9), (0, 0), (1e-9, -1e-9)]
    expected_bbox = ((-1e9, -1e9), (1e9, 1e9))
    codeflash_output = polygon_to_bbox(polygon) # 2.72μs -> 1.89μs (43.8% faster)

def test_polygon_with_duplicate_points():
    # Polygon with duplicate points
    polygon = [(1, 2), (3, 4), (1, 2), (3, 4)]
    expected_bbox = ((1, 2), (3, 4))
    codeflash_output = polygon_to_bbox(polygon) # 2.15μs -> 1.28μs (68.0% faster)

def test_polygon_with_min_max_swapped():
    # Ensure min/max are not affected by order
    polygon = [(10, 20), (20, 10), (15, 25), (25, 15)]
    expected_bbox = ((10, 10), (25, 25))
    codeflash_output = polygon_to_bbox(polygon) # 2.16μs -> 1.29μs (67.3% faster)

def test_polygon_with_mixed_int_float():
    # Polygon with mixed int and float values
    polygon = [(1, 2.5), (3.7, 4), (2, 1.1), (4.2, 3)]
    expected_bbox = ((1, 1.1), (4.2, 4))
    codeflash_output = polygon_to_bbox(polygon) # 2.92μs -> 2.16μs (35.5% faster)

def test_polygon_with_three_points():
    # Polygon with three points (triangle)
    polygon = [(2, 3), (5, 1), (4, 6)]
    expected_bbox = ((2, 1), (5, 6))
    codeflash_output = polygon_to_bbox(polygon) # 2.17μs -> 1.18μs (84.0% faster)

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_large_polygon_1000_points():
    # Polygon with 1000 points forming a grid
    polygon = [(i, j) for i in range(10) for j in range(100)]
    expected_bbox = ((0, 0), (9, 99))
    codeflash_output = polygon_to_bbox(polygon) # 75.4μs -> 27.3μs (177% faster)

def test_large_polygon_random_points():
    # Polygon with 1000 random points within a known range
    import random
    random.seed(42)  # For deterministic results
    xs = [random.uniform(-1000, 1000) for _ in range(1000)]
    ys = [random.uniform(-500, 500) for _ in range(1000)]
    polygon = list(zip(xs, ys))
    expected_bbox = ((min(xs), min(ys)), (max(xs), max(ys)))
    codeflash_output = polygon_to_bbox(polygon) # 63.7μs -> 29.5μs (116% faster)

def test_large_polygon_all_same_point():
    # Polygon with 1000 identical points
    polygon = [(7.7, -3.3)] * 1000
    expected_bbox = ((7.7, -3.3), (7.7, -3.3))
    codeflash_output = polygon_to_bbox(polygon) # 63.3μs -> 32.2μs (96.8% faster)

def test_large_polygon_extreme_values():
    # Polygon with extreme float values
    polygon = [(-1e308, 1e308), (1e308, -1e308)] * 500
    expected_bbox = ((-1e308, -1e308), (1e308, 1e308))
    codeflash_output = polygon_to_bbox(polygon) # 62.9μs -> 44.0μs (43.1% faster)

def test_large_polygon_increasing_values():
    # Polygon with strictly increasing x and y values
    polygon = [(i, i * 2) for i in range(1000)]
    expected_bbox = ((0, 0), (999, 1998))
    codeflash_output = polygon_to_bbox(polygon) # 63.2μs -> 36.5μs (73.3% faster)

def test_large_polygon_decreasing_values():
    # Polygon with strictly decreasing x and y values
    polygon = [(999 - i, 1998 - 2 * i) for i in range(1000)]
    expected_bbox = ((0, 0), (999, 1998))
    codeflash_output = polygon_to_bbox(polygon) # 63.0μs -> 25.0μs (152% faster)

# ------------------------
# Error Handling Test Cases
# ------------------------

def test_empty_polygon_raises():
    # Polygon with no points should raise ValueError
    polygon = []
    with pytest.raises(ValueError):
        polygon_to_bbox(polygon) # 3.38μs -> 2.55μs (32.4% faster)

def test_invalid_polygon_type_raises():
    # Polygon with invalid data type (not tuple of numbers)
    polygon = [("a", "b"), (None, None), ([], {})]
    with pytest.raises(TypeError):
        polygon_to_bbox(polygon) # 2.99μs -> 2.28μs (31.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-polygon_to_bbox-mg7s2g9e and push.

Codeflash

The optimization replaces the original `zip(*polygon)` approach with a single-pass iterator-based algorithm that eliminates intermediate data structure allocations and reduces function call overhead.

**Key optimizations:**
1. **Eliminates `zip(*polygon)` overhead**: The original creates two temporary tuples containing all x and y coordinates, which requires memory allocation and unpacking operations. The optimized version processes coordinates directly without intermediate collections.

2. **Single-pass min/max computation**: Instead of calling `min()` and `max()` functions (which internally iterate through the data), the optimized version computes all four values (min_x, max_x, min_y, max_y) in one iteration using simple comparison operations.

3. **Reduces function call overhead**: The original makes 4 function calls (`min(x)`, `min(y)`, `max(x)`, `max(y)`), while the optimized version uses direct comparisons that are faster than function calls in Python.

**Performance characteristics based on test results:**
- **Small polygons (1-4 points)**: 50-115% speedup due to eliminated overhead
- **Medium polygons**: 60-95% speedup from avoiding temporary data structures
- **Large polygons (1000+ points)**: 73-177% speedup where single-pass iteration really shines, especially when coordinates have patterns (increasing/decreasing sequences show highest gains)
- **Edge cases**: Maintains identical error handling behavior for empty polygons while still providing 27-32% speedup

The optimization is particularly effective for larger polygons where memory allocation costs and multiple iterations become significant bottlenecks.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 1, 2025 09:22
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant