Skip to content

Commit

Permalink
Merge branch 'master' into release
Browse files Browse the repository at this point in the history
  • Loading branch information
venaturum committed Nov 2, 2021
2 parents 26be802 + 7b69907 commit 77ea64e
Show file tree
Hide file tree
Showing 20 changed files with 1,175 additions and 44 deletions.
2 changes: 1 addition & 1 deletion docs/getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The domain of the intervals can be either numerical, :class:`pandas.Timestamp` o
- have a finite, length
- are left-closed right-open, or right-closed left-open

A small :ref:`case study <user_guide.calendar_example>` using :mod:`piso` can be found in the :ref:`user guide <user_guide>`. Further examples, and a detailed explanation of functionality, are provided in the :ref:`api`.
Several :ref:`case studies <case_studies>` using :mod:`piso` can be found in the :ref:`user guide <user_guide>`. Further examples, and a detailed explanation of functionality, are provided in the :ref:`api`.


Versioning
Expand Down
1 change: 1 addition & 0 deletions docs/reference/accessors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ Accessors
ArrayAccessor.issubset
ArrayAccessor.coverage
ArrayAccessor.complement
ArrayAccessor.contains
ArrayAccessor.get_indexer
4 changes: 3 additions & 1 deletion docs/reference/package.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,7 @@ Top level functions
issubset
coverage
complement
contains
get_indexer
lookup
lookup
join
48 changes: 32 additions & 16 deletions docs/release_notes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,37 @@ Release notes
========================


ADD UNRELEASED CHANGES ABOVE THIS LINE

**v0.5.0 2021-11-02**

Added the following methods

- :func:`piso.join` for *join operations* with interval indexes
- :func:`piso.contains`
- :meth:`ArrayAccessor.contains() <piso.accessor.ArrayAccessor.contains>`

Performance improvements for

- :func:`piso.lookup`
- :func:`piso.get_indexer`


**v0.4.0 2021-10-30**

Added the following methods

- :meth:`piso.lookup`
- :meth:`piso.get_indexer`
- :func:`piso.lookup`
- :func:`piso.get_indexer`
- :meth:`ArrayAccessor.get_indexer() <piso.accessor.ArrayAccessor.get_indexer>`


**v0.3.0 2021-10-23**

Added the following methods

- :meth:`piso.coverage`
- :meth:`piso.complement`
- :func:`piso.coverage`
- :func:`piso.complement`
- :meth:`ArrayAccessor.coverage() <piso.accessor.ArrayAccessor.coverage>`
- :meth:`ArrayAccessor.complement() <piso.accessor.ArrayAccessor.complement>`

Expand All @@ -28,9 +44,9 @@ Added the following methods

Added the following methods

- :meth:`piso.isdisjoint`
- :meth:`piso.issuperset`
- :meth:`piso.issubset`
- :func:`piso.isdisjoint`
- :func:`piso.issuperset`
- :func:`piso.issubset`
- :meth:`ArrayAccessor.isdisjoint() <piso.accessor.ArrayAccessor.isdisjoint>`
- :meth:`ArrayAccessor.issuperset() <piso.accessor.ArrayAccessor.issuperset>`
- :meth:`ArrayAccessor.issubset() <piso.accessor.ArrayAccessor.issubset>`
Expand All @@ -42,17 +58,17 @@ Added the following methods

The following methods are included in the initial release of `piso`

- :meth:`piso.register_accessors`
- :meth:`piso.union`
- :meth:`piso.intersection`
- :meth:`piso.difference`
- :meth:`piso.symmetric_difference`
- :func:`piso.register_accessors`
- :func:`piso.union`
- :func:`piso.intersection`
- :func:`piso.difference`
- :func:`piso.symmetric_difference`
- :meth:`ArrayAccessor.union() <piso.accessor.ArrayAccessor.union>`
- :meth:`ArrayAccessor.intersection() <piso.accessor.ArrayAccessor.intersection>`
- :meth:`ArrayAccessor.difference() <piso.accessor.ArrayAccessor.difference>`
- :meth:`ArrayAccessor.symmetric_difference() <piso.accessor.ArrayAccessor.symmetric_difference>`
- :meth:`piso.interval.union`
- :meth:`piso.interval.intersection`
- :meth:`piso.interval.difference`
- :meth:`piso.interval.symmetric_difference`
- :func:`piso.interval.union`
- :func:`piso.interval.intersection`
- :func:`piso.interval.difference`
- :func:`piso.interval.symmetric_difference`

2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ ipykernel
sphinx == 4.0.2
nbsphinx == 0.8.6
sphinx-panels
staircase
staircase >= 2.1
pandas
numpy
Pygments
Expand Down
112 changes: 112 additions & 0 deletions docs/user_guide/case_studies/football.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
.. _user_guide.football_example:


Analysis of scores in a football match
=======================================

In this example we will look at a football match from 2009:

The Champions League quarter-final between Chelsea and Liverpool
in 2009 is recognised as among the best games of all time.
Liverpool scored twice in the first half in the 19th and 28th minute.
Chelsea then opened their account in the second half with three
unanswered goals in the 51st, 57th and 76th minute. Liverpool
responded with two goals in the 81st and 83rd minute to put themselves
ahead, however Chelsea drew with a goal in the 89th minute and advanced
to the next stage on aggregate.


We start by importing :mod:`pandas` and :mod:`piso`

.. ipython:: python
import pandas as pd
import piso
For the analysis we will create a :class:`pandas.Series`, indexed by a :class:`pandas.IntervalIndex` for each team. The values of each series will be the team's score and the interval index, defined by :class:`pandas.Timedelta`, will describe the durations corresponding to each score. We define the following function which creates such a Series, given the minute marks for each score.

.. ipython:: python
def make_series(goal_time_mins):
breaks = pd.to_timedelta([0] + goal_time_mins + [90], unit="min")
ii = pd.IntervalIndex.from_breaks(breaks)
return pd.Series(range(len(ii)), index = ii, name="score")
We can now create each Series.

.. ipython:: python
chelsea = make_series([51,57,76,89])
liverpool = make_series([19,28,81,83])
For reference, the Series corresponding to `chelsea` is

.. ipython:: python
chelsea
To enable analysis for separate halves of the game we'll define a similar Series which defines the time intervals for each half

.. ipython:: python
halves = pd.Series(
["1st", "2nd"],
pd.IntervalIndex.from_breaks(pd.to_timedelta([0, 45, 90], unit="min")),
name="half",
)
halves
We can now perform a join on these three Series. Since `chelsea` and `liverpool` Series have the same name it will be necessary to provide suffixes to differentiate the columns in the result. The `halves` Series does not have the same name, but a suffix must be defined for each of the join operands if there are any overlaps.

.. ipython:: python
CvsL = piso.join(chelsea, liverpool, halves, suffixes=["_chelsea", "_liverpool", ""])
CvsL
By default, the :func:`piso.join` function performs a left-join. Since every interval index represents the same domain, that is `(0', 90']`, all join types - *left*, *right*, *inner*, *outer* - will give the same result.

Using this dataframe we will now provide answers for miscellaneous questions. In particular we will filter the dataframe based on values in the columns, then sum the lengths of the intervals in the filtered index.


**How much game time did Chelsea lead for?**

.. ipython:: python
CvsL.query("score_chelsea > score_liverpool").index.length.sum()
**How much game time did Liverpool lead for?**

.. ipython:: python
CvsL.query("score_liverpool > score_chelsea").index.length.sum()
**How much game time were the teams tied for?**

.. ipython:: python
CvsL.query("score_liverpool == score_chelsea").index.length.sum()
**How much game time in the first half were the teams tied for?**

.. ipython:: python
CvsL.query("score_chelsea == score_liverpool and half == '1st'").index.length.sum()
**For how long did Liverpool lead Chelsea by exactly one goal (split by half)?**

.. ipython:: python
CvsL.groupby("half").apply(
lambda df: df.query("score_liverpool - score_chelsea == 1").index.length.sum()
)
**What was the score at the 80 minute mark?**

.. ipython:: python
piso.lookup(CvsL, pd.Timedelta(80, unit="min"))
This analysis is also straightforward using :mod:`staircase`. For more information on this please see the :ref:`corresponding example with staircase <user_guide.football_staircase_example>`
128 changes: 128 additions & 0 deletions docs/user_guide/case_studies/football_staircase.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
.. _user_guide.football_staircase_example:


Analysis of scores in a football match (using staircase)
===========================================================

.. ipython:: python
:suppress:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.style.use('seaborn')
This example demonstrates how :mod:`staircase` can be used to mirror the functionality
and analysis presented in the :ref:`corresponding example with piso <user_guide.football_example>`.

The Champions League quarter-final between Chelsea and Liverpool
in 2009 is recognised as among the best games of all time.
Liverpool scored twice in the first half in the 19th and 28th minute.
Chelsea then opened their account in the second half with three
unanswered goals in the 51st, 57th and 76th minute. Liverpool
responded with two goals in the 81st and 83rd minute to put themselves
ahead, however Chelsea drew with a goal in the 89th minute and advanced
to the next stage on aggregate.


We start by importing :mod:`pandas` and :mod:`staircase`

.. ipython:: python
import pandas as pd
import staircase as sc
For the analysis we will create a :class:`staircase.Stairs` for each team, and wrap them up in a :class:`pandas.Series` which is indexed by the club names. Using a Series in this way is by no means necessary but can be useful. We'll create a function `make_stairs` which takes the minute marks of the goals and returns a :class:`staircase.Stairs`. Each step function will be monotonically non-decreasing.

.. ipython:: python
def make_stairs(goal_time_mins):
breaks = pd.to_timedelta(goal_time_mins, unit="min")
return sc.Stairs(start=breaks).clip(pd.Timedelta(0), pd.Timedelta("90m"))
scores = pd.Series(
{
"chelsea":make_stairs([51,57,76,89]),
"liverpool":make_stairs([19,28,81,83]),
}
)
scores
To clarify we plot these step functions below.

.. ipython:: python
:suppress:
fig, axes = plt.subplots(ncols=2, figsize=(8,3), sharey=True)
vals = scores["chelsea"].step_values
vals.index = vals.index/pd.Timedelta("1min")
sc.Stairs.from_values(0, vals).plot(axes[0])
axes[0].set_title("Chelsea")
axes[0].set_xlabel("time (mins)")
axes[0].set_ylabel("score")
axes[0].yaxis.set_major_locator(ticker.MultipleLocator())
axes[0].set_xlim(0,90)
vals = scores["liverpool"].step_values
vals.index = vals.index/pd.Timedelta("1min")
sc.Stairs.from_values(0, vals).plot(axes[1])
axes[1].set_title("Liverpool")
axes[1].set_xlabel("time (mins)")
axes[1].set_ylabel("score")
@savefig case_study_football_staircase.png
plt.tight_layout();
To enable analysis for separate halves of the game we'll define a similar Series which defines the time intervals for each half with tuples of :class:`pandas.Timedeltas`.

.. ipython:: python
halves = pd.Series(
{
"1st":(pd.Timedelta(0), pd.Timedelta("45m")),
"2nd":(pd.Timedelta("45m"), pd.Timedelta("90m")),
}
)
halves
We can now use our *scores* and *halves* Series to provide answers for miscellaneous questions. Note that comparing :class:`staircase.Stairs` objects with relational operators produces boolean-valued step functions (Stairs objects). Finding the integral of these boolean step functions is equivalent to summing up lengths of intervals in the domain where the step function is equal to one.

**How much game time did Chelsea lead for?**

.. ipython:: python
(scores["chelsea"] > scores["liverpool"]).integral()
**How much game time did Liverpool lead for?**

.. ipython:: python
(scores["chelsea"] < scores["liverpool"]).integral()
**How much game time were the teams tied for?**

.. ipython:: python
(scores["chelsea"] == scores["liverpool"]).integral()
**How much game time in the first half were the teams tied for?**

.. ipython:: python
(scores["chelsea"] == scores["liverpool"]).where(halves["1st"]).integral()
**For how long did Liverpool lead Chelsea by exactly one goal (split by half)?**

.. ipython:: python
halves.apply(lambda x:
(scores["liverpool"]==scores["chelsea"]+1).where(x).integral()
)
**What was the score at the 80 minute mark?**

.. ipython:: python
sc.sample(scores, pd.Timedelta("80m"))
Loading

0 comments on commit 77ea64e

Please sign in to comment.