Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficiency In Overlaps Call #127

Open
jgaupp-accuragen opened this issue Mar 28, 2023 · 1 comment
Open

Inefficiency In Overlaps Call #127

jgaupp-accuragen opened this issue Mar 28, 2023 · 1 comment

Comments

@jgaupp-accuragen
Copy link

jgaupp-accuragen commented Mar 28, 2023

Overlaps performs much poorer than Overlap when running a tree with many small intervals.

The performance problem was found when calling Overlaps against a tree of ~450,000 intervals typically around 400 units wide. The problem was not obvious when calling Overlaps against a tree of ~5,000 intervals typically around 100,000 units wide.

When calling overlaps against the tree of 450K intervals, a large set of queries completed in ~15 minutes, while overlap took ~12 seconds.

cProfile identified the time spent at -

731923 1414.870 0.002 1414.905 0.002 /<venv path>/python3.10/site-packages/intervaltree/intervaltree.py:616(<genexpr>)

where column 2 is total time (1414.870) and the the operation referenced at intervaltree.py:616 is -

return any(
    self.overlaps_point(bound)
    for bound in self.boundary_table
    if begin < bound < end
) 
@jgaupp-accuragen
Copy link
Author

And thanks for your effort in creating this library.

It's provided an excellent balance of size and performance, when dictionaries proved prohibitive by memory use and linear interval searches were unwieldy or impossible to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant