Skip to content

jmoralez/window_ops

Repository files navigation

Window ops

This library is intended to be used as an alternative to pd.Series.rolling and pd.Series.expanding to gain a speedup by using numba optimized functions operating on numpy arrays. There are also online classes for more efficient updates of window statistics.

Install

PyPI

pip install window-ops

conda

conda install -c conda-forge window-ops

How to use

Transformations

For a transformations n_samples -> n_samples you can use [seasonal_](rolling|expanding)_(mean|max|min|std) on an array.

Benchmarks

pd.__version__
'1.3.5'
n_samples = 10_000  # array size
window_size = 8  # for rolling operations
season_length = 7  # for seasonal operations
execute_times = 10 # number of times each function will be executed

Average times in milliseconds.

times.applymap('{:.2f}'.format)
window_ops pandas
rolling_mean 0.03 0.43
rolling_max 0.14 0.57
rolling_min 0.14 0.58
rolling_std 0.06 0.54
expanding_mean 0.03 0.31
expanding_max 0.05 0.76
expanding_min 0.05 0.47
expanding_std 0.09 0.41
seasonal_rolling_mean 0.05 3.89
seasonal_rolling_max 0.18 4.27
seasonal_rolling_min 0.18 3.75
seasonal_rolling_std 0.08 4.38
seasonal_expanding_mean 0.04 3.18
seasonal_expanding_max 0.06 3.29
seasonal_expanding_min 0.06 3.28
seasonal_expanding_std 0.12 3.89
speedups = times['pandas'] / times['window_ops']
speedups = speedups.to_frame('times faster')
speedups.applymap('{:.0f}'.format)
times faster
rolling_mean 15
rolling_max 4
rolling_min 4
rolling_std 9
expanding_mean 12
expanding_max 15
expanding_min 9
expanding_std 4
seasonal_rolling_mean 77
seasonal_rolling_max 23
seasonal_rolling_min 21
seasonal_rolling_std 52
seasonal_expanding_mean 78
seasonal_expanding_max 52
seasonal_expanding_min 51
seasonal_expanding_std 33

Online

If you have an array for which you want to compute a window statistic and then keep updating it as more samples come in you can use the classes in the window_ops.online module. They all have a fit_transform method which take the array and return the transformations defined above but also have an update method that take a single value and return the new statistic.

Benchmarks

Average time in milliseconds it takes to transform the array and perform 100 updates.

times.to_frame().applymap('{:.2f}'.format)
average time (ms)
RollingMean 0.12
RollingMax 0.23
RollingMin 0.22
RollingStd 0.32
ExpandingMean 0.10
ExpandingMax 0.07
ExpandingMin 0.07
ExpandingStd 0.17
SeasonalRollingMean 0.28
SeasonalRollingMax 0.35
SeasonalRollingMin 0.38
SeasonalRollingStd 0.42
SeasonalExpandingMean 0.17
SeasonalExpandingMax 0.14
SeasonalExpandingMin 0.15
SeasonalExpandingStd 0.23