Skip to content

Commit

Permalink
Merge branch 'master' into release
Browse files Browse the repository at this point in the history
  • Loading branch information
venaturum committed Nov 4, 2021
2 parents 7cceedb + 9b6658c commit 0ac48ad
Show file tree
Hide file tree
Showing 11 changed files with 144 additions and 27 deletions.
35 changes: 32 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

# piso - pandas interval set operations

**piso** exists to bring set operations (union, intersection, difference + more) to [pandas'](https://pandas.pydata.org/) interval classes, specifically
**piso** exists to bring set operations (union, intersection, difference + more), analytical methods, and lookup and join functionality to [pandas'](https://pandas.pydata.org/) interval classes, specifically

- pandas.Interval
- pandas.arrays.IntervalArray
Expand All @@ -36,11 +36,40 @@ Currently, there is a lack of such functionality in pandas, although it has been
<IntervalArray>
[(3, 4]]
Length: 1, closed: right, dtype: interval[int64]

>>> arr.piso.contains([2, 3, 5])
2 3 5
(1, 5] True True True
(3, 6] False False True
(2, 4] False True False

>>> df = pd.DataFrame(
... {"A":[4,3], "B":["x","y"]},
... index=pd.IntervalIndex.from_tuples([(1,3), (5,7)]),
... )

>>> s = pd.Series(
... [True, False],
... index=pd.IntervalIndex.from_tuples([(2,4), (5,6)]),
... name="C",
... )

>>> piso.join(df, s)
A B C
(1, 2] 4 x NaN
(2, 3] 4 x True
(5, 6] 3 y False
(6, 7] 3 y NaN

>>> piso.join(df, s, how="inner")
A B C
(2, 3] 4 x True
(5, 6] 3 y False
```

The domain of the intervals can be either numerical, `pandas.Timestamp` or `pandas.Timedelta`.

A small [case study](https://piso.readthedocs.io/en/latest/user_guide/calendar.html) using piso can be found in the [user guide](https://piso.readthedocs.io/en/latest/user_guide/index.html). Further examples, and a detailed explanation of functionality, are provided in the [API reference](https://piso.readthedocs.io/en/latest/reference/index.html).
Several [case studies](https://piso.readthedocs.io/en/latest/user_guide/case_studies/index.html) using piso can be found in the [user guide](https://piso.readthedocs.io/en/latest/user_guide/index.html). Further examples, and a detailed explanation of functionality, are provided in the [API reference](https://piso.readthedocs.io/en/latest/reference/index.html).

Visit [https://piso.readthedocs.io](https://piso.readthedocs.io/) for the documentation.

Expand Down Expand Up @@ -70,7 +99,7 @@ This project is licensed under the [MIT License](https://github.com/staircase-de

## Acknowledgments

Currently, piso is a pure-python implentation which relies heavily on [staircase](https://www.staircase.dev) and [pandas](https://pandas.pydata.org/). It is clearly designed to operate as part of the *pandas ecosystem*. The colours for the piso logo have been assimilated from pandas as a homage, and is not to intended to imply and affiliation with, or endorsement by, pandas.
Currently, piso is a pure-python implentation which relies heavily on [staircase](https://www.staircase.dev) and [pandas](https://pandas.pydata.org/). It is designed to operate as part of the *pandas ecosystem*. The colours for the piso logo have been assimilated from pandas as a homage, and is not to intended to imply and affiliation with, or endorsement by, pandas.

Additionally, two classes have been borrowed, almost verbatim, from the pandas source code:

Expand Down
6 changes: 4 additions & 2 deletions docs/getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ To install the latest version through conda-forge::
Package overview
----------------

`piso` exists to bring set operations to :mod:`pandas` interval classes, specifically
`piso` exists to bring set operations (union, intersection, difference + more), analytical methods, and lookup and join functionality to :mod:`pandas` interval classes, specifically

- :class:`pandas.Interval`
- :class:`pandas.arrays.IntervalArray`
Expand All @@ -30,12 +30,14 @@ Currently, there is a lack of such functionality in `pandas`, although it has be

An array of intervals can be interpreted in two different ways. It can be seen as a container for intervals, which are sets, or if the intervals are disjoint it may be seen as a set itself. Both interpretations are supported by the methods introduced by :mod:`piso`.

The domain of the intervals can be either numerical, :class:`pandas.Timestamp` or :class:`pandas.Timedelta`. Currently, :mod:`piso` is limited to intervals which:
The domain of the intervals can be either numerical, :class:`pandas.Timestamp` or :class:`pandas.Timedelta`. Currently, most of the set operaitons in :mod:`piso` are limited to intervals which:

- have a non-zero length
- have a finite, length
- are left-closed right-open, or right-closed left-open

To check if these restrictions apply to a particular method, please consult the :ref:`api`.

Several :ref:`case studies <case_studies>` using :mod:`piso` can be found in the :ref:`user guide <user_guide>`. Further examples, and a detailed explanation of functionality, are provided in the :ref:`api`.


Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

.. rst-class:: center

Pandas Interval Set Operations: methods for set operations for pandas' Interval, IntervalArray and IntervalIndex
Pandas Interval Set Operations: methods for set operations, analytics, lookups and joins on pandas' Interval, IntervalArray and IntervalIndex

.. image:: img/powered_by_staircase.svg
:target: https://www.staircase.dev
Expand Down
9 changes: 9 additions & 0 deletions docs/release_notes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@ Release notes
========================


**v0.6.0 2021-11-05**

The following methods were extended to accommodate intervals with *closed = "both"* or *"neither"*

- :func:`piso.contains` (and :meth:`ArrayAccessor.contains() <piso.accessor.ArrayAccessor.contains>`)
- :func:`piso.get_indexer` (and :meth:`ArrayAccessor.get_indexer() <piso.accessor.ArrayAccessor.get_indexer>`)
- :func:`piso.lookup`
- :func:`piso.isdisjoint` (and :meth:`ArrayAccessor.get_indexer() <piso.accessor.ArrayAccessor.get_indexer>`)

**v0.5.0 2021-11-02**

Added the following methods
Expand Down
6 changes: 6 additions & 0 deletions piso/docstrings/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,9 @@ def join_params(list_of_param_strings):
isdisjoint_doc = (
"""
Indicates whether one, or more, sets are disjoint or not.
*interval_array* must be left-closed or right-closed if *interval_arrays is non-empty.
If no arguments are provided then this restriction does not apply.
"""
+ template_doc
)
Expand Down Expand Up @@ -691,6 +694,8 @@ def join_params(list_of_param_strings):
Given a set of disjoint intervals (contained in the interval array that the accessor belongs to)
and a value, or vector, *x*, returns the index positions of the interval which contains each value in x.
*interval_array* can be left-closed, right-closed, both or neither.
Parameters
----------
x : scalar, or array-like of scalars
Expand Down Expand Up @@ -739,6 +744,7 @@ def join_params(list_of_param_strings):
----------
x : scalar, or array-like of scalars
Values in *x* should belong to the same domain as the intervals in *interval_array*.
May be left-closed, right-closed, both, or neither.
include_index : boolean, default True
Indicates whether to return a :class:`numpy.ndarray` or :class:`pandas.DataFrame` indexed
by *interval_array* and column names equal to *x*
Expand Down
10 changes: 8 additions & 2 deletions piso/docstrings/intervalarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,9 @@ def join_params(list_of_param_strings):
isdisjoint_doc = (
"""
Indicates whether one, or more, sets are disjoint or not.
*interval_array* must be left-closed or right-closed if *interval_arrays is non-empty.
If *interval_array* is the only argument then this restriction does not apply.
"""
+ template_doc
)
Expand Down Expand Up @@ -599,6 +602,7 @@ def join_params(list_of_param_strings):
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Contains the (possibly overlapping) intervals which partially, or wholly cover the domain.
May be left-closed, right-closed, both, or neither.
domain : :py:class:`tuple`, :class:`pandas.Interval`, :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`, optional
Specifies the domain over which to calculate the "coverage". If *domain* is `None`,
then the domain is considered to be the extremities of the intervals contained in *interval_array*
Expand Down Expand Up @@ -646,7 +650,7 @@ def join_params(list_of_param_strings):
Parameters
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Contains the (possibly overlapping) intervals.
Contains the (possibly overlapping) intervals. Must be left-closed or right-closed.
domain : :py:class:`tuple`, :class:`pandas.Interval`, :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`, optional
Specifies the domain over which to calculate the "complement". If *domain* is `None`,
then the domain is considered to be the extremities of the intervals contained in *interval_array*
Expand Down Expand Up @@ -699,6 +703,8 @@ def join_params(list_of_param_strings):
Given a set of disjoint intervals and a value, or vector, *x* returns the
index positions of the interval which contains each value in x.
*interval_array* can be left-closed, right-closed, both or neither.
Parameters
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Expand Down Expand Up @@ -746,7 +752,7 @@ def join_params(list_of_param_strings):
Parameters
----------
interval_array : :class:`pandas.IntervalIndex` or :class:`pandas.arrays.IntervalArray`
Contains the intervals. Must be left-closed or right-closed.
Contains the intervals. May be left-closed, right-closed, both, or neither.
x : scalar, or array-like of scalars
Values in *x* should belong to the same domain as the intervals in *interval_array*.
include_index : boolean, default True
Expand Down
2 changes: 1 addition & 1 deletion piso/docstrings/ndframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
----------
*frames_or_series : argument list of :class:`pandas.DataFrame` or :class:`pandas.Series`
May contain two or more arguments, all of which must be indexed by a
:class:`pandas.IntervalIndex` containing disjoint intervals.
:class:`pandas.IntervalIndex` containing disjoint intervals. The index can have any *closed* value.
Every :class:`pandas.Series` must have a name.
how : {"left", "right", "inner", "outer"}, default "left"
What sort of join to perform.
Expand Down
29 changes: 19 additions & 10 deletions piso/intervalarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@ def _check_matched_closed(interval_arrays):
assert closed_values.count(closed_values[0]) == len(closed_values)


def _validate_array_of_intervals_arrays(*interval_arrays):
def _validate_array_of_intervals_arrays(*interval_arrays, validate_intervals=True):
assert len(interval_arrays) > 0
_check_matched_closed(interval_arrays)
for arr in interval_arrays:
_validate_intervals(arr)
if validate_intervals:
for arr in interval_arrays:
_validate_intervals(arr)


def _get_return_type(interval_array, return_type):
Expand Down Expand Up @@ -108,7 +109,9 @@ def symmetric_difference(

@Appender(docstrings.isdisjoint_docstring, join="\n", indents=1)
def isdisjoint(interval_array, *interval_arrays):
_validate_array_of_intervals_arrays(interval_array, *interval_arrays)
_validate_array_of_intervals_arrays(
interval_array, *interval_arrays, validate_intervals=bool(interval_arrays)
)
if interval_arrays:
stairs = _make_stairs(interval_array, *interval_arrays)
result = stairs.max() <= 1
Expand All @@ -117,7 +120,10 @@ def isdisjoint(interval_array, *interval_arrays):
else:
arr = np.stack([interval_array.left.values, interval_array.right.values])
arr = arr[arr[:, 0].argsort()]
result = np.all(arr[0, 1:] >= arr[1, :-1])
if interval_array.closed == "both":
result = np.all(arr[0, 1:] > arr[1, :-1])
else:
result = np.all(arr[0, 1:] >= arr[1, :-1])
return result


Expand Down Expand Up @@ -185,6 +191,7 @@ def coverage(interval_array, domain=None):

@Appender(docstrings.complement_docstring, join="\n", indents=1)
def complement(interval_array, domain=None):
_validate_intervals(interval_array)
stepfunction = _interval_x_to_stairs(interval_array).invert()
if isinstance(domain, (pd.IntervalIndex, pd.arrays.IntervalArray)):
domain = _interval_x_to_stairs(domain)
Expand All @@ -200,11 +207,13 @@ def contains(interval_array, x, include_index=True):
starts = interval_array.left.values
ends = interval_array.right.values
x = pd.Series(x).values
if interval_array.closed == "right":
result = np.less_equal.outer(x, ends) & np.greater.outer(x, starts)
else:
result = np.less.outer(x, ends) & np.greater_equal.outer(x, starts)
result = result.transpose()
right_compare = (
np.less_equal if interval_array.closed in ("right", "both") else np.less
)
left_compare = (
np.greater_equal if interval_array.closed in ("left", "both") else np.greater
)
result = (right_compare.outer(x, ends) & left_compare.outer(x, starts)).transpose()
if include_index:
return pd.DataFrame(result, index=interval_array, columns=x)
return result
Expand Down
2 changes: 1 addition & 1 deletion piso/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def _validate_intervals(interval_array):
if not all(interval_array.length): # test for degenerate intervals
raise DegenerateIntervalError(interval_array)
if interval_array.closed not in ("left", "right"):
raise ClosedValueError
raise ClosedValueError(interval_array.closed)


def _interval_x_to_stairs(interval_array):
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ build-backend = "poetry.masonry.api"

[tool.poetry]
name = "piso"
version = "0.5.0"
description = "Pandas Interval Set Operations: methods for set operations for pandas' Interval, IntervalArray and IntervalIndex"
version = "0.6.0"
description = "Pandas Interval Set Operations: methods for set operations, analytics, lookups and joins on pandas' Interval, IntervalArray and IntervalIndex"
readme = "README.md"
authors = ["Riley Clement <[email protected]>"]
maintainers = ["Riley Clement <[email protected]>"]
Expand Down
Loading

0 comments on commit 0ac48ad

Please sign in to comment.