Panel data is pandas, clustering expects numpy #603

steipatr · 2023-07-27T09:09:40Z

steipatr
Jul 27, 2023

Hi,

I am looking for clarification on recommended/expected data formats and workflow for aeon's clustering algorithms.

My data is structured as follows: there are three observational units, each with 1000 time series of 401 time steps each. This data was generated by a simulation model which we ran 1000 times, recording the behavior of three key state variables over 400 time steps (+ 1 initial value),

I have converted this data into a Pandas.MultiIndex "panel" as recommended in https://www.aeon-toolkit.org/en/latest/examples/forecasting/forecasting_hierarchical_global.html#Panels-(=-flat-collections)-of-time-series---Panel-scitype,-%22pd-multiindex%22-mtype. This gives me a Pandas.DataFrame with shape 401000 rows × 3 columns, and a two-level index.

When I pass this dataframe (or a slice of it) to an aeon clusterer (e.g. TimeSeriesKMedoids), I get an error message:

TypeError: X is not of a supported input data type.X must be of type np.ndarray, found <class 'pandas.core.series.Series'>.

So the recommended data format for my panel data is a dataframe, but the clusterer expects a numpy array. This makes me doubt whether my workflow is correct. Should there be some kind of Transformer between data and clusterer? I can do something like

outcomes_panel['Observational Unit 1'].unstack().to_numpy()

to go from panel dataframe to numpy array for a specific observational unit, but it seems weird to do that every time. Am I missing something in my workflow?

On a side note, I found a comment that MultiIndex is not relevant for clustering (#37 (comment)). What is the rationale behind this? I would like to eventually simultaneously cluster across all the observational units, so the Multindex/panel format seems fitting.

Answered by TonyBagnall

Jul 27, 2023

hi, clustering does not have the converters yet. All of clustering/classification/regression work internally with numpy arrays of shape (n_cases, n_channels, n_timepoints), so your data should be in numpy of shape (1000,1, 401). Classification and regression simply convert multiindex to numpy internally (as will clustering when we finish the base redesign). You can convert yourself, or there are converters in aeon. I dont know how to generate a random pd.MultiIndex, I have never used them, we work with numpy almost exclusively, so just convert back and forth :) Multiindex seems to just be long format?

from aeon.datatypes import convert
import numpy as np

X = np.random.random(size=(1000,1,4…

View full answer

TonyBagnall · 2023-07-27T10:48:50Z

TonyBagnall
Jul 27, 2023
Maintainer

hi, clustering does not have the converters yet. All of clustering/classification/regression work internally with numpy arrays of shape (n_cases, n_channels, n_timepoints), so your data should be in numpy of shape (1000,1, 401). Classification and regression simply convert multiindex to numpy internally (as will clustering when we finish the base redesign). You can convert yourself, or there are converters in aeon. I dont know how to generate a random pd.MultiIndex, I have never used them, we work with numpy almost exclusively, so just convert back and forth :) Multiindex seems to just be long format?

from aeon.datatypes import convert
import numpy as np

X = np.random.random(size=(1000,1,401))
multi = convert(X,from_type="numpy3D",to_type="pd-multiindex")
print(f" type = {type(multi)} shape = {multi.shape}")
X2 = convert(multi,from_type = "pd-multiindex",to_type="numpy3D")
print(f" type = {type(X2)} shape = {X2.shape}")

0 replies

steipatr · 2023-08-03T06:48:32Z

steipatr
Aug 3, 2023
Author

OK, so my intuition was somewhat correct. Thank you for the clarification!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panel data is pandas, clustering expects numpy #603

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Panel data is pandas, clustering expects numpy #603

steipatr Jul 27, 2023

Replies: 2 comments

TonyBagnall Jul 27, 2023 Maintainer

steipatr Aug 3, 2023 Author

steipatr
Jul 27, 2023

TonyBagnall
Jul 27, 2023
Maintainer

steipatr
Aug 3, 2023
Author