16
16
class BaseSegmenter (BaseSeriesEstimator , ABC ):
17
17
"""Base class for segmentation algorithms.
18
18
19
- Segmenters take a single time series of length $m$ and returns a segmentation.
20
- Series can be univariate (single series) or multivariate, with $d$ dimensions.
21
-
22
- Input and internal data format
23
- Univariate series:
24
- Numpy array:
25
- shape `(m,)`, `(m, 1)` or `(1, m)`. if ``self`` has no multivariate
26
- capability, i.e.``self.get_tag(
27
- ""capability:multivariate") == False``, all are converted to 1D
28
- numpy `(m,)`
29
- if ``self`` has multivariate capability, converted to 2D numpy `(m,1)` or
30
- `(1, m)` depending on axis
31
- pandas DataFrame or Series:
32
- DataFrame single column shape `(m,1)`, `(1,m)` or Series shape `(m,)`
33
- if ``self`` has no multivariate capability, all converted to Series `(m,)`
34
- if ``self`` has multivariate capability, all converted to Pandas DataFrame
35
- shape `(m,1)`, `(1,m)` depending on axis
36
-
37
- Multivariate series:
38
- Numpy array, shape `(m,d)` or `(d,m)`.
39
- pandas DataFrame `(m,d)` or `(d,m)`
40
-
41
- Conversion and axis resolution for multivariate
42
-
43
- Conversion between numpy and pandas is handled by the base class. Sub classses
44
- can assume the data is in the correct format (determined by
45
- ``"X_inner_type"``, one of ``aeon.base._base_series.VALID_INNER_TYPES)`` and
46
- represented with the expected
47
- axis.
48
-
49
- Multivariate series are segmented along an axis determined by ``self.axis``.
50
- Axis plays two roles:
51
-
52
- 1) the axis the segmenter expects the data to be in for its internal methods
53
- ``_fit`` and ``_predict``: 0 means each column is a time series, and the data is
54
- shaped `(m,d)`, axis equal to 1 means each row is a time series, sometimes
55
- called wide format, and the whole series is shape `(d,m)`. This should be set
56
- for a given child class through the BaseSegmenter constructor.
57
-
58
- 2) The optional ``axis`` argument passed to the base class ``fit`` and
59
- ``predict`` methods. If the data ``axis`` is different to the ``axis``
60
- expected (i.e. value stored in ``self.axis``, then it is transposed in this
61
- base class if self has multivariate capability.
19
+ Segmenters take a single time series of length ``n_timepoints`` and returns a
20
+ segmentation. Series can be univariate (single series) or multivariate,
21
+ with ``n_channels`` dimensions. If the segmenter can handle multivariate series,
22
+ if will have the tag ``"capability:multivariate"`` set to True. Multivariate
23
+ series are segmented along a the axis of time determined by ``self.axis``.
62
24
63
25
Segmentation representation
64
26
65
27
Given a time series of 10 points with two change points found in position 4
66
28
and 8.
67
29
68
30
The segmentation can be output in two forms:
69
- a) A list of change points.
31
+ a) A list of change points (tag ``"returns_dense"`` is True) .
70
32
output example [4,8] for a series length 10 means three segments at
71
33
positions (0,1,2,3), (4,5,6,7) and (8,9).
72
34
This dense representation is the default behaviour, as it is the minimal
@@ -76,7 +38,8 @@ class BaseSegmenter(BaseSeriesEstimator, ABC):
76
38
last less than the series length. If the last value is
77
39
``n_timepoints-1`` then the last point forms a single segment. An empty
78
40
list indicates no change points.
79
- b) A list of integers of length m indicating the segment of each time point:
41
+ b) A list of integers of length m indicating the segment of each time point (
42
+ tag ``"returns_dense"`` is False).
80
43
output [0,0,0,0,1,1,1,1,2,2] or output [0,0,0,1,1,1,1,0,0,0]
81
44
This sparse representation can be used to indicate shared segments
82
45
indicating segment 1 is somehow the same (perhaps in generative process)
@@ -87,15 +50,16 @@ class BaseSegmenter(BaseSeriesEstimator, ABC):
87
50
88
51
Parameters
89
52
----------
90
- n_segments : int, default = 2
91
- Number of segments to split the time series into. If None, then the number of
92
- segments needs to be found in fit.
93
- axis : int, default = 1
53
+ axis : int
94
54
Axis along which to segment if passed a multivariate series (2D input). If axis
95
55
is 0, it is assumed each column is a time series and each row is a
96
56
timepoint. i.e. the shape of the data is ``(n_timepoints,n_channels)``.
97
57
``axis == 1`` indicates the time series are in rows, i.e. the shape of the data
98
- is ``(n_channels, n_timepoints)`.
58
+ is ``(n_channels, n_timepoints)`. Each segmenter must specify the axis it
59
+ assumes in the constructor and pass it to the base class.
60
+ n_segments : int, default = 2
61
+ Number of segments to split the time series into. If None, then the number of
62
+ segments needs to be found in fit.
99
63
100
64
"""
101
65
@@ -124,7 +88,7 @@ def fit(self, X, y=None, axis=1):
124
88
Parameters
125
89
----------
126
90
X : One of ``VALID_INPUT_TYPES``
127
- Input time series
91
+ Input time series to fit a segmenter.
128
92
y : One of ``VALID_INPUT_TYPES`` or None, default None
129
93
Training time series, a labeled 1D series same length as X for supervised
130
94
segmentation.
0 commit comments