Skip to content

Commit 3a01b53

Browse files
authored
[DOC] Update the docstring for BaseSegmenter (#1741)
* remove dependency * segmenter docstring * segmenter docstring * Update minirocket.ipynb
1 parent 4405c56 commit 3a01b53

File tree

1 file changed

+15
-51
lines changed

1 file changed

+15
-51
lines changed

aeon/segmentation/base.py

+15-51
Original file line numberDiff line numberDiff line change
@@ -16,57 +16,19 @@
1616
class BaseSegmenter(BaseSeriesEstimator, ABC):
1717
"""Base class for segmentation algorithms.
1818
19-
Segmenters take a single time series of length $m$ and returns a segmentation.
20-
Series can be univariate (single series) or multivariate, with $d$ dimensions.
21-
22-
Input and internal data format
23-
Univariate series:
24-
Numpy array:
25-
shape `(m,)`, `(m, 1)` or `(1, m)`. if ``self`` has no multivariate
26-
capability, i.e.``self.get_tag(
27-
""capability:multivariate") == False``, all are converted to 1D
28-
numpy `(m,)`
29-
if ``self`` has multivariate capability, converted to 2D numpy `(m,1)` or
30-
`(1, m)` depending on axis
31-
pandas DataFrame or Series:
32-
DataFrame single column shape `(m,1)`, `(1,m)` or Series shape `(m,)`
33-
if ``self`` has no multivariate capability, all converted to Series `(m,)`
34-
if ``self`` has multivariate capability, all converted to Pandas DataFrame
35-
shape `(m,1)`, `(1,m)` depending on axis
36-
37-
Multivariate series:
38-
Numpy array, shape `(m,d)` or `(d,m)`.
39-
pandas DataFrame `(m,d)` or `(d,m)`
40-
41-
Conversion and axis resolution for multivariate
42-
43-
Conversion between numpy and pandas is handled by the base class. Sub classses
44-
can assume the data is in the correct format (determined by
45-
``"X_inner_type"``, one of ``aeon.base._base_series.VALID_INNER_TYPES)`` and
46-
represented with the expected
47-
axis.
48-
49-
Multivariate series are segmented along an axis determined by ``self.axis``.
50-
Axis plays two roles:
51-
52-
1) the axis the segmenter expects the data to be in for its internal methods
53-
``_fit`` and ``_predict``: 0 means each column is a time series, and the data is
54-
shaped `(m,d)`, axis equal to 1 means each row is a time series, sometimes
55-
called wide format, and the whole series is shape `(d,m)`. This should be set
56-
for a given child class through the BaseSegmenter constructor.
57-
58-
2) The optional ``axis`` argument passed to the base class ``fit`` and
59-
``predict`` methods. If the data ``axis`` is different to the ``axis``
60-
expected (i.e. value stored in ``self.axis``, then it is transposed in this
61-
base class if self has multivariate capability.
19+
Segmenters take a single time series of length ``n_timepoints`` and returns a
20+
segmentation. Series can be univariate (single series) or multivariate,
21+
with ``n_channels`` dimensions. If the segmenter can handle multivariate series,
22+
if will have the tag ``"capability:multivariate"`` set to True. Multivariate
23+
series are segmented along a the axis of time determined by ``self.axis``.
6224
6325
Segmentation representation
6426
6527
Given a time series of 10 points with two change points found in position 4
6628
and 8.
6729
6830
The segmentation can be output in two forms:
69-
a) A list of change points.
31+
a) A list of change points (tag ``"returns_dense"`` is True).
7032
output example [4,8] for a series length 10 means three segments at
7133
positions (0,1,2,3), (4,5,6,7) and (8,9).
7234
This dense representation is the default behaviour, as it is the minimal
@@ -76,7 +38,8 @@ class BaseSegmenter(BaseSeriesEstimator, ABC):
7638
last less than the series length. If the last value is
7739
``n_timepoints-1`` then the last point forms a single segment. An empty
7840
list indicates no change points.
79-
b) A list of integers of length m indicating the segment of each time point:
41+
b) A list of integers of length m indicating the segment of each time point (
42+
tag ``"returns_dense"`` is False).
8043
output [0,0,0,0,1,1,1,1,2,2] or output [0,0,0,1,1,1,1,0,0,0]
8144
This sparse representation can be used to indicate shared segments
8245
indicating segment 1 is somehow the same (perhaps in generative process)
@@ -87,15 +50,16 @@ class BaseSegmenter(BaseSeriesEstimator, ABC):
8750
8851
Parameters
8952
----------
90-
n_segments : int, default = 2
91-
Number of segments to split the time series into. If None, then the number of
92-
segments needs to be found in fit.
93-
axis : int, default = 1
53+
axis : int
9454
Axis along which to segment if passed a multivariate series (2D input). If axis
9555
is 0, it is assumed each column is a time series and each row is a
9656
timepoint. i.e. the shape of the data is ``(n_timepoints,n_channels)``.
9757
``axis == 1`` indicates the time series are in rows, i.e. the shape of the data
98-
is ``(n_channels, n_timepoints)`.
58+
is ``(n_channels, n_timepoints)`. Each segmenter must specify the axis it
59+
assumes in the constructor and pass it to the base class.
60+
n_segments : int, default = 2
61+
Number of segments to split the time series into. If None, then the number of
62+
segments needs to be found in fit.
9963
10064
"""
10165

@@ -124,7 +88,7 @@ def fit(self, X, y=None, axis=1):
12488
Parameters
12589
----------
12690
X : One of ``VALID_INPUT_TYPES``
127-
Input time series
91+
Input time series to fit a segmenter.
12892
y : One of ``VALID_INPUT_TYPES`` or None, default None
12993
Training time series, a labeled 1D series same length as X for supervised
13094
segmentation.

0 commit comments

Comments
 (0)