Description
The predict() function currently builds the output dictionary using:
for index, row in df_data.iterrows():
Inside the loop, it repeatedly filters the entire dataframe to collect values for each (True X, True Y) pair:
df_data[
(df_data["True X"] == row["True X"]) &
(df_data["True Y"] == row["True Y"])
]
This causes:
Repeated work for duplicated (True X, True Y) combinations
Significant slowdown on medium/large datasets
Unnecessary memory allocations
Root Cause
The algorithm iterates row-by-row while performing full-dataframe filtering for each iteration, despite the fact that grouping by truth coordinates is already available.
Proposed Solution
Refactor the dictionary-building logic to:
Group data once using:
df_data.groupby("True XY'')
Iterate over each group only once.
Extract:
-
Predicted X
-
Predicted Y
-
Precomputed precision_xy
-
Precomputed accuracy_xy
Build the nested output dictionary directly from groups.
Expected Benefits
-
Reduce complexity from O(n²) → O(n)
-
Eliminate duplicated computations
-
Improve scalability for large datasets
-
Cleaner and more maintainable code
Acceptance Criteria
-
No iterrows() usage in this section.
-
No dataframe filtering inside loops.
-
Output structure remains identical.
-
Metrics (PrecisionSD, Accuracy) unchanged.
-
Performance improvement confirmed on larger datasets.
Description
The predict() function currently builds the output dictionary using:
Inside the loop, it repeatedly filters the entire dataframe to collect values for each (True X, True Y) pair:
This causes:
Repeated work for duplicated (True X, True Y) combinations
Significant slowdown on medium/large datasets
Unnecessary memory allocations
Root Cause
The algorithm iterates row-by-row while performing full-dataframe filtering for each iteration, despite the fact that grouping by truth coordinates is already available.
Proposed Solution
Refactor the dictionary-building logic to:
Group data once using:
Iterate over each group only once.
Extract:
Predicted X
Predicted Y
Precomputed precision_xy
Precomputed accuracy_xy
Build the nested output dictionary directly from groups.
Expected Benefits
Reduce complexity from O(n²) → O(n)
Eliminate duplicated computations
Improve scalability for large datasets
Cleaner and more maintainable code
Acceptance Criteria
No iterrows() usage in this section.
No dataframe filtering inside loops.
Output structure remains identical.
Metrics (PrecisionSD, Accuracy) unchanged.
Performance improvement confirmed on larger datasets.