Skip to content

Commit

Permalink
closed values extended to include "both" and "neither" for .isdisjoin… (
Browse files Browse the repository at this point in the history
#34)

* closed values extended to include "both" and "neither" for .isdisjoint .contains .lookup .get_indexer

* degenerate intervals now allowed for .isdisjoint
  • Loading branch information
venaturum committed Nov 4, 2021
1 parent 3de6dd3 commit daac8ba
Show file tree
Hide file tree
Showing 8 changed files with 106 additions and 20 deletions.
4 changes: 3 additions & 1 deletion docs/getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@ Currently, there is a lack of such functionality in `pandas`, although it has be

An array of intervals can be interpreted in two different ways. It can be seen as a container for intervals, which are sets, or if the intervals are disjoint it may be seen as a set itself. Both interpretations are supported by the methods introduced by :mod:`piso`.

The domain of the intervals can be either numerical, :class:`pandas.Timestamp` or :class:`pandas.Timedelta`. Currently, :mod:`piso` is limited to intervals which:
The domain of the intervals can be either numerical, :class:`pandas.Timestamp` or :class:`pandas.Timedelta`. Currently, most of the set operaitons in :mod:`piso` are limited to intervals which:

- have a non-zero length
- have a finite, length
- are left-closed right-open, or right-closed left-open

To check if these restrictions apply to a particular method, please consult the :ref:`api`.

Several :ref:`case studies <case_studies>` using :mod:`piso` can be found in the :ref:`user guide <user_guide>`. Further examples, and a detailed explanation of functionality, are provided in the :ref:`api`.


Expand Down
7 changes: 7 additions & 0 deletions docs/release_notes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ Release notes

ADD UNRELEASED CHANGES ABOVE THIS LINE

The following methods were extended to accommodate intervals with *closed = "both"* or *"neither"*

- :func:`piso.contains` (and :meth:`ArrayAccessor.contains() <piso.accessor.ArrayAccessor.contains>`)
- :func:`piso.get_indexer` (and :meth:`ArrayAccessor.get_indexer() <piso.accessor.ArrayAccessor.get_indexer>`)
- :func:`piso.lookup`
- :func:`piso.isdisjoint` (and :meth:`ArrayAccessor.get_indexer() <piso.accessor.ArrayAccessor.get_indexer>`)

**v0.5.0 2021-11-02**

Added the following methods
Expand Down
6 changes: 6 additions & 0 deletions piso/docstrings/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,9 @@ def join_params(list_of_param_strings):
isdisjoint_doc = (
"""
Indicates whether one, or more, sets are disjoint or not.
*interval_array* must be left-closed or right-closed if *interval_arrays is non-empty.
If no arguments are provided then this restriction does not apply.
"""
+ template_doc
)
Expand Down Expand Up @@ -691,6 +694,8 @@ def join_params(list_of_param_strings):
Given a set of disjoint intervals (contained in the interval array that the accessor belongs to)
and a value, or vector, *x*, returns the index positions of the interval which contains each value in x.
*interval_array* can be left-closed, right-closed, both or neither.
Parameters
----------
x : scalar, or array-like of scalars
Expand Down Expand Up @@ -739,6 +744,7 @@ def join_params(list_of_param_strings):
----------
x : scalar, or array-like of scalars
Values in *x* should belong to the same domain as the intervals in *interval_array*.
May be left-closed, right-closed, both, or neither.
include_index : boolean, default True
Indicates whether to return a :class:`numpy.ndarray` or :class:`pandas.DataFrame` indexed
by *interval_array* and column names equal to *x*
Expand Down
10 changes: 8 additions & 2 deletions piso/docstrings/intervalarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,9 @@ def join_params(list_of_param_strings):
isdisjoint_doc = (
"""
Indicates whether one, or more, sets are disjoint or not.
*interval_array* must be left-closed or right-closed if *interval_arrays is non-empty.
If *interval_array* is the only argument then this restriction does not apply.
"""
+ template_doc
)
Expand Down Expand Up @@ -599,6 +602,7 @@ def join_params(list_of_param_strings):
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Contains the (possibly overlapping) intervals which partially, or wholly cover the domain.
May be left-closed, right-closed, both, or neither.
domain : :py:class:`tuple`, :class:`pandas.Interval`, :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`, optional
Specifies the domain over which to calculate the "coverage". If *domain* is `None`,
then the domain is considered to be the extremities of the intervals contained in *interval_array*
Expand Down Expand Up @@ -646,7 +650,7 @@ def join_params(list_of_param_strings):
Parameters
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Contains the (possibly overlapping) intervals.
Contains the (possibly overlapping) intervals. Must be left-closed or right-closed.
domain : :py:class:`tuple`, :class:`pandas.Interval`, :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`, optional
Specifies the domain over which to calculate the "complement". If *domain* is `None`,
then the domain is considered to be the extremities of the intervals contained in *interval_array*
Expand Down Expand Up @@ -699,6 +703,8 @@ def join_params(list_of_param_strings):
Given a set of disjoint intervals and a value, or vector, *x* returns the
index positions of the interval which contains each value in x.
*interval_array* can be left-closed, right-closed, both or neither.
Parameters
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Expand Down Expand Up @@ -746,7 +752,7 @@ def join_params(list_of_param_strings):
Parameters
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Contains the intervals. Must be left-closed or right-closed.
Contains the intervals. May be left-closed, right-closed, both, or neither.
x : scalar, or array-like of scalars
Values in *x* should belong to the same domain as the intervals in *interval_array*.
include_index : boolean, default True
Expand Down
2 changes: 1 addition & 1 deletion piso/docstrings/ndframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
----------
*frames_or_series : argument list of :class:`pandas.DataFrame` or :class:`pandas.Series`
May contain two or more arguments, all of which must be indexed by a
:class:`pandas.IntervalIndex` containing disjoint intervals.
:class:`pandas.IntervalIndex` containing disjoint intervals. The index can have any *closed* value.
Every :class:`pandas.Series` must have a name.
how : {"left", "right", "inner", "outer"}, default "left"
What sort of join to perform.
Expand Down
29 changes: 19 additions & 10 deletions piso/intervalarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@ def _check_matched_closed(interval_arrays):
assert closed_values.count(closed_values[0]) == len(closed_values)


def _validate_array_of_intervals_arrays(*interval_arrays):
def _validate_array_of_intervals_arrays(*interval_arrays, validate_intervals=True):
assert len(interval_arrays) > 0
_check_matched_closed(interval_arrays)
for arr in interval_arrays:
_validate_intervals(arr)
if validate_intervals:
for arr in interval_arrays:
_validate_intervals(arr)


def _get_return_type(interval_array, return_type):
Expand Down Expand Up @@ -108,7 +109,9 @@ def symmetric_difference(

@Appender(docstrings.isdisjoint_docstring, join="\n", indents=1)
def isdisjoint(interval_array, *interval_arrays):
_validate_array_of_intervals_arrays(interval_array, *interval_arrays)
_validate_array_of_intervals_arrays(
interval_array, *interval_arrays, validate_intervals=bool(interval_arrays)
)
if interval_arrays:
stairs = _make_stairs(interval_array, *interval_arrays)
result = stairs.max() <= 1
Expand All @@ -117,7 +120,10 @@ def isdisjoint(interval_array, *interval_arrays):
else:
arr = np.stack([interval_array.left.values, interval_array.right.values])
arr = arr[arr[:, 0].argsort()]
result = np.all(arr[0, 1:] >= arr[1, :-1])
if interval_array.closed == "both":
result = np.all(arr[0, 1:] > arr[1, :-1])
else:
result = np.all(arr[0, 1:] >= arr[1, :-1])
return result


Expand Down Expand Up @@ -185,6 +191,7 @@ def coverage(interval_array, domain=None):

@Appender(docstrings.complement_docstring, join="\n", indents=1)
def complement(interval_array, domain=None):
_validate_intervals(interval_array)
stepfunction = _interval_x_to_stairs(interval_array).invert()
if isinstance(domain, (pd.IntervalIndex, pd.arrays.IntervalArray)):
domain = _interval_x_to_stairs(domain)
Expand All @@ -200,11 +207,13 @@ def contains(interval_array, x, include_index=True):
starts = interval_array.left.values
ends = interval_array.right.values
x = pd.Series(x).values
if interval_array.closed == "right":
result = np.less_equal.outer(x, ends) & np.greater.outer(x, starts)
else:
result = np.less.outer(x, ends) & np.greater_equal.outer(x, starts)
result = result.transpose()
right_compare = (
np.less_equal if interval_array.closed in ("right", "both") else np.less
)
left_compare = (
np.greater_equal if interval_array.closed in ("left", "both") else np.greater
)
result = (right_compare.outer(x, ends) & left_compare.outer(x, starts)).transpose()
if include_index:
return pd.DataFrame(result, index=interval_array, columns=x)
return result
Expand Down
2 changes: 1 addition & 1 deletion piso/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def _validate_intervals(interval_array):
if not all(interval_array.length): # test for degenerate intervals
raise DegenerateIntervalError(interval_array)
if interval_array.closed not in ("left", "right"):
raise ClosedValueError
raise ClosedValueError(interval_array.closed)


def _interval_x_to_stairs(interval_array):
Expand Down
66 changes: 61 additions & 5 deletions tests/test_single_interval_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,7 @@ def make_date(x):
return interval_array.from_arrays(
interval_array.left.map(make_date),
interval_array.right.map(make_date),
interval_array.closed,
)


Expand All @@ -477,7 +478,7 @@ def make_date(x):
)
@pytest.mark.parametrize(
"closed",
["left", "right"],
["left", "right", "neither"],
)
@pytest.mark.parametrize(
"date_type",
Expand All @@ -487,14 +488,52 @@ def make_date(x):
"how",
["supplied", "accessor", "package"],
)
def test_isdisjoint(interval_index, tuples, expected, closed, date_type, how):
def test_isdisjoint_left_right_neither(
interval_index, tuples, expected, closed, date_type, how
):

interval_array = make_ia_from_tuples(interval_index, tuples, closed)
interval_array = map_to_dates(interval_array, date_type)
result = perform_op(interval_array, how=how, function=piso_intervalarray.isdisjoint)
assert result == expected


@pytest.mark.parametrize(
"interval_index",
[True, False],
)
@pytest.mark.parametrize(
"tuples, expected",
[
([], True),
([(1, 2), (2, 3)], False),
([(1, 2), (3, 3)], True),
([(1, 2), (3, 4)], True),
([(1, 3), (2, 4)], False),
([(1, 4), (2, 3)], False),
([(1, 2), (2, 3), (3, 4)], False),
([(1, 2), (3, 4), (5, 6)], True),
([(1, 3), (2, 4), (5, 6)], False),
([(1, 4), (2, 3), (5, 6)], False),
],
)
@pytest.mark.parametrize(
"date_type",
["timestamp", "numpy", "datetime", "timedelta", None],
)
@pytest.mark.parametrize(
"how",
["supplied", "accessor", "package"],
)
def test_isdisjoint_both(interval_index, tuples, expected, date_type, how):

interval_array = make_ia_from_tuples(interval_index, tuples, "both")
interval_array = map_to_dates(interval_array, date_type)
print(interval_array)
result = perform_op(interval_array, how=how, function=piso_intervalarray.isdisjoint)
assert result == expected


@pytest.mark.parametrize(
"interval_index",
[True, False],
Expand Down Expand Up @@ -632,8 +671,14 @@ def test_complement(interval_index, domain, expected_tuples, closed, how):
(4, "left", -1),
(3, "right", -1),
(4, "right", 0),
(3, "both", 0),
(4, "both", 0),
(3, "neither", -1),
(4, "neither", -1),
([3, 9, 12], "left", np.array([0, 1, -1])),
([3, 9, 12], "right", np.array([-1, 1, -1])),
([3, 9, 12], "both", np.array([0, 1, -1])),
([3, 9, 12], "neither", np.array([-1, 1, -1])),
],
)
@pytest.mark.parametrize(
Expand Down Expand Up @@ -678,8 +723,12 @@ def test_get_indexer_exception(how):
[
(0, "left", [[True], [False], [False]]),
(0, "right", [[False], [False], [False]]),
(0, "both", [[True], [False], [False]]),
(0, "neither", [[False], [False], [False]]),
(6, "left", [[False], [False], [False]]),
(6, "right", [[False], [False], [True]]),
(6, "neither", [[False], [False], [False]]),
(6, "both", [[False], [False], [True]]),
(
[2, 4, 5],
"left",
Expand All @@ -690,6 +739,16 @@ def test_get_indexer_exception(how):
"right",
[[True, True, False], [False, True, True], [False, True, True]],
),
(
[2, 4, 5],
"both",
[[True, True, False], [True, True, True], [False, True, True]],
),
(
[2, 4, 5],
"neither",
[[True, False, False], [False, True, False], [False, True, True]],
),
],
)
@pytest.mark.parametrize(
Expand All @@ -709,9 +768,6 @@ def test_contains(interval_index, x, closed, expected, how, include_index):
how=how,
function=piso_intervalarray.contains,
)
print(result)
print(ia)
print(x)
if include_index:
expected_result = pd.DataFrame(expected, index=ia, columns=np.array(x, ndmin=1))
pd.testing.assert_frame_equal(result, expected_result, check_dtype=False)
Expand Down

0 comments on commit daac8ba

Please sign in to comment.