⚡️ Speed up function get_annual_indicator_names by 33%
#14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 33% (0.33x) speedup for
get_annual_indicator_namesinqnt/data/secgov_fundamental.py⏱️ Runtime :
693 microseconds→522 microseconds(best of173runs)📝 Explanation and details
The optimized code achieves a 32% speedup by converting the membership testing from O(N*M) to O(1) lookups and using more efficient set operations.
Key optimizations:
Set conversion for O(1) lookups: Converts
GLOBAL_ANNUAL_US_GAAPS(a list) to a set once at the beginning. This changes eachfact in GLOBAL_ANNUAL_US_GAAPSlookup from O(M) list scanning to O(1) hash table lookup, where M is the size of the GAAP list (~23 elements).Set subset operation: Replaces the generator expression
all(fact in GLOBAL_ANNUAL_US_GAAPS for fact in facts)withset(facts).issubset(global_annual_us_gaaps_set). This leverages optimized C-level set operations instead of Python loops.Empty facts handling: Explicitly handles the edge case where
factsis empty to preserve the original logic (empty facts should be included, asall()on empty iterables returnsTrue).Why this works:
The original code performed ~69% of its time (2.02ms out of 2.94ms total) on the membership checking line. With 1,240 indicators tested and potentially multiple facts per indicator, the O(N*M) complexity of repeated list lookups became the bottleneck. Set operations are implemented in C and highly optimized for these exact use cases.
Test case performance:
The optimization shows consistent 15-45% improvements across all test scenarios, with the largest gains (35-45%) occurring in edge cases with empty facts or when the GAAPS list is cleared, suggesting the set conversion overhead is minimal compared to the lookup savings.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-get_annual_indicator_names-mgk4g4nland push.