support for null intervals #59

mtivadar · 2017-08-22T15:33:40Z

it should be nice to support null intervals, like [4, 4]

chaimleib · 2017-09-12T15:37:31Z

There have been many requests to include points within the range search. I tried doing this once, but the algorithms for traversing and modifying the tree really depend on intervals containing some range. So, if you would like to include 4 in your interval tree, you could add the interval [4, 5), which would start at 4 inclusively and end at 5 without including it, which is how all Intervals work in the IntervalTree. They include the start value, but not the end value.

This behavior is analogous to string or array indexing:

>>> s = "abcdef"
>>> s[4:5]
"e"
>>> s[4:4]
""

>>> a = [0, 1, 2, 3, 4, 5]
>>> a[4:5]
[4]
>>> a[4:4]
[]

If your range limits are not restricted to discrete values, for example, they can be reals with fractional parts, then for now, I would suggest coding around sortedcontainers.SortedDict. This is how I would like to add point support in a future release, but to a new class with a different name, since this is different in behavior from the current interval tree.

kotoroshinoto · 2018-04-11T19:00:16Z

in the case where left is included and right is excluded a position described as i:i would refer to the location between i-1 and i, not including either i-1 or i themselves.

to give context for this, a description of an insertion mutation in a genome would refer to a position between nucleotides as the location of origin for the variant, because that "range" doesn't exist in the reference sequence, but rather has a 'location' that lies between existing positions

such a position would NOT be encompassed by ranges that touch either side of it, but would be if they crossed over it.

lets take position 3:3 for instance. It is a "between" position that lies in the potential insertion gap between 2 and 3, not including 2 OR 3. It would NOT overlap with either 1:3 (which technically ends at 2 if we're dealing only with integers) or even with 3:6 (because this range begins at 3 itself, not the gap between 2 and 3).

2:3 would overlap it, 3:3 explicitly would as well.

chaimleib · 2018-12-17T15:53:42Z

In order to support using floats and other non-integer ranges, Intervals include the lower bound, but not the upper bound, so Interval(2, 3) is 2 ≤ x < 3. This is a deep dependency in the search, insertion and balancing algorithms, and other methods of IntervalTree.

If you tried to create Interval(3, 3), you would get 3 ≤ x < 3. There is no x that satisfies this inequality, since x = 3 won't satisfy x < 3. The interval includes no points, and is equivalent to the null set.

If I were to make a new kind of tree where point searches did not include boundaries, weird and awkward behavior would result. For example, if I asked it "What events do I have at 3 o'clock?", an event beginning at 3 would not appear.

There do exist other libraries that support all types of intervals, inclusive or not inclusive of either the lower or upper bound. However, the added flexibility results in a very complicated API and even more complicated internal logic for dealing with intervals. I opted to only support half-open intervals, as defined above.

gbuckholtz · 2020-06-04T00:04:56Z

It is useful for me to have null intervals in the context I am using the intervaltree. I am making an interval tree of CIDR format IP addresses. 1.0.254.0/23 for example. There is a degenerate case of 1.0.254.0/32 that boils down to a single IP address rather than a range.

I did read your first reply and bumped the upper range of the degenerate case to be [lower:lower+1] to set the interval.

GillesJ · 2020-06-05T09:17:12Z

Handling of null intervals is a better fit conceptually where the intervals represent unit positions in data sets.
My use-case is in natural language processing where sequences of words in text are labelled. Often you want to assess if annotations in running text overlap. Any text dataset has a lot of one-word length annotations. [Word spans] are typically represented by intervals of word positions, e.g. [0,1] = [Word spans]. As mentioned, you can transform the positional representation to boundary representation by "end_position + 1" in combination with exclusive right boundary matching.
However the programmer has to always be cognizant of this representative transformation.

I used to use Sympy for text problems because it is a better fit conceptually as it casts null intervals to FiniteSet objects. But Sympy is way too slow and memory intensive on large datasets. Intervaltree.py is fast and light, so I will keep using this package over Sympy.

chaimleib closed this as completed Dec 18, 2018

chaimleib pinned this issue Dec 18, 2018

chaimleib added the Q&A Questions and answers label Mar 5, 2019

cryptoluks mentioned this issue Nov 30, 2021

Firewall-cmd is extremely slow when populating large ipsets firewalld/firewalld#881

Closed

Thomas131 mentioned this issue Sep 24, 2024

Document on how to correctly use it for discrete value intervals (including single value intervals) #143

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for null intervals #59

support for null intervals #59

mtivadar commented Aug 22, 2017

chaimleib commented Sep 12, 2017 •

edited

Loading

kotoroshinoto commented Apr 11, 2018 •

edited

Loading

chaimleib commented Dec 17, 2018 •

edited

Loading

gbuckholtz commented Jun 4, 2020 •

edited

Loading

GillesJ commented Jun 5, 2020

support for null intervals #59

support for null intervals #59

Comments

mtivadar commented Aug 22, 2017

chaimleib commented Sep 12, 2017 • edited Loading

kotoroshinoto commented Apr 11, 2018 • edited Loading

chaimleib commented Dec 17, 2018 • edited Loading

gbuckholtz commented Jun 4, 2020 • edited Loading

GillesJ commented Jun 5, 2020

chaimleib commented Sep 12, 2017 •

edited

Loading

kotoroshinoto commented Apr 11, 2018 •

edited

Loading

chaimleib commented Dec 17, 2018 •

edited

Loading

gbuckholtz commented Jun 4, 2020 •

edited

Loading