Panel data is pandas, clustering expects numpy #603
-
Hi, I am looking for clarification on recommended/expected data formats and workflow for aeon's clustering algorithms. My data is structured as follows: there are three observational units, each with 1000 time series of 401 time steps each. This data was generated by a simulation model which we ran 1000 times, recording the behavior of three key state variables over 400 time steps (+ 1 initial value), I have converted this data into a Pandas.MultiIndex "panel" as recommended in https://www.aeon-toolkit.org/en/latest/examples/forecasting/forecasting_hierarchical_global.html#Panels-(=-flat-collections)-of-time-series---Panel-scitype,-%22pd-multiindex%22-mtype. This gives me a Pandas.DataFrame with shape 401000 rows × 3 columns, and a two-level index. When I pass this dataframe (or a slice of it) to an aeon clusterer (e.g. TimeSeriesKMedoids), I get an error message:
So the recommended data format for my panel data is a dataframe, but the clusterer expects a numpy array. This makes me doubt whether my workflow is correct. Should there be some kind of Transformer between data and clusterer? I can do something like
to go from panel dataframe to numpy array for a specific observational unit, but it seems weird to do that every time. Am I missing something in my workflow? On a side note, I found a comment that MultiIndex is not relevant for clustering (#37 (comment)). What is the rationale behind this? I would like to eventually simultaneously cluster across all the observational units, so the Multindex/panel format seems fitting. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
hi, clustering does not have the converters yet. All of clustering/classification/regression work internally with numpy arrays of shape (n_cases, n_channels, n_timepoints), so your data should be in numpy of shape (1000,1, 401). Classification and regression simply convert multiindex to numpy internally (as will clustering when we finish the base redesign). You can convert yourself, or there are converters in aeon. I dont know how to generate a random pd.MultiIndex, I have never used them, we work with numpy almost exclusively, so just convert back and forth :) Multiindex seems to just be long format? from aeon.datatypes import convert
import numpy as np
X = np.random.random(size=(1000,1,401))
multi = convert(X,from_type="numpy3D",to_type="pd-multiindex")
print(f" type = {type(multi)} shape = {multi.shape}")
X2 = convert(multi,from_type = "pd-multiindex",to_type="numpy3D")
print(f" type = {type(X2)} shape = {X2.shape}") |
Beta Was this translation helpful? Give feedback.
-
OK, so my intuition was somewhat correct. Thank you for the clarification! |
Beta Was this translation helpful? Give feedback.
hi, clustering does not have the converters yet. All of clustering/classification/regression work internally with numpy arrays of shape (n_cases, n_channels, n_timepoints), so your data should be in numpy of shape (1000,1, 401). Classification and regression simply convert multiindex to numpy internally (as will clustering when we finish the base redesign). You can convert yourself, or there are converters in aeon. I dont know how to generate a random pd.MultiIndex, I have never used them, we work with numpy almost exclusively, so just convert back and forth :) Multiindex seems to just be long format?