From version 2.5.0 on, we use the semantic versioning scheme:
Version X.Y.Z stands for:
- X = Major version: if any backwards incompatible changes are introduced to the public API
- Y = Minor version: if new, backwards compatible functionality is introduced to the public API
- Z = Patch version: if only backwards compatible bug fixes are introduced
- The
dump
andload
functions are now inherited from the BaseTimeseriesRegressor. - Added abstract functions
dump_parameters
andload_parameters
for dumping and loading model files. - Implemented
dump_parameters
andload_parameters
for models. - Outliers in the
_interactive_quantile_plot
and_static_quantile_plot
functions must now be within or equal to the quantile boundaries.
- Added the option to forcefully overwrite the optimizer.
- Removed support for Python 3.9
- Updated pandas, sqlalchemy, tensorflow, numpy, and scikit-learn.
- Implemented necessary changes to keep behaviour unchanged.
- Make
'ID'
and'TYPE'
columnspd.Categorical
instead ofstr
, to reduce the memory spike when usingpd.pivot_table
insam_format_to_wide
. - Added parameter in
QuantileRegressor
to use HiGHS solver, as recommended in https://docs.scipy.org/doc/scipy/reference/optimize.linprog-highs.html. This will also keep the package compatible with future versions of SciPy.
- Allow numpy versions up to 1.23.x. 1.24 is not yet supported by shap (and shap does not specify this constraint in its requirements). For future reference, note that numpy 1.24 is also not supported by h5py versions below 3.0.0 (again without specifying) as it uses the deprecated
np.typeDict
. h5py is a requirement of tensorflow. - Upgrade tensorflow
- Limit scikit-learn version <2
ConstantTimeseriesRegressor
now fills nan values in input data with zero before callingpreprocess_fit
in order to successfully (by)pass validation fromBaseTimeseriesRegressor
. Besides scikit-learn compatibility, the input data is not actually used when fitting.
ConstantTimeseriesRegressor
no longer checks dtypes of input data, nor nan/inf values, as the input is only used to determine the shape of the predictions.
- Updated
BaseTimeseriesRegressor.get_feature_names_out()
so, in case of the feature engineer is aPipeline
, it returns the names from the lastColumnTransformer
, if available
- Updated wrong types in
quantile_plot.py
- Properly included datasets so
load_rainbow_beach()
andload_sewage_data()
work
- Fixed a bug where
data_sources.weather
was not installed.
- Added logo to README and documentation
- Added Lasso example to documentation
- Add
pytest --doctest-modules ./sam*
tounittest.yml
in github actions workflows to test all docstring examples.
- Fixed all docstring examples (using
pytest --doctest-modules ./sam*
). - Some bugfixes for SHAP and feature importance
- Updated index page of the documentation
- New class
sam.models.LassoTimeseriesRegressor
to create a Lasso regression model for time series data incl. quantile predictions. - New class
sam.preprocessing.ClipTransformer
to clip input values to the range from the train set, making models more robust again - New abstract base class
sam.validation.BaseValidator
for all validators. - Renamed
sam.validation.RemoveFlatlines
tosam.validation.FlatlineValidator
.sam.validation.RemoveFlatlines
is still available, but removed in future versions. - Renamed
sam.validation.RemoveExtremeValues
tosam.validation.MADValidator
.sam.validation.RemoveExtremeValues
is still available, but removed in future versions. - New class
sam.validation.OutsideRangeValidator
for checking / removing data outside of a range. - New function
datetime_train_test_split
to split pandas dataframes and series based on a datetime. - New
sam.datasets
module containing functions for loading read-to-use datasets:sam.datasets.load_rainbow_beach
andsam.datasets.load_sewage_data
. st outliers.
- Added
average_type
toBaseTimeseriesRegressor.__init__()
. MLPTimeseriesRegressor.__init__()
now passesaverage_type
toBaseTimeseriesRegressor.__init__()
.- Update
BaseTimeseriesRegressor.score()
to account for theself.average_type
: in case of "mean" take the MSE of the average predictions and in case of "median" take the MAE of the average predictions. - Fixed various spelling errors in
CHANGELOG.MD
andmodels
. - Updated package dependencies for scikit-learn
- Changed the DeepExplainer to the model agnostic KernelExplainer, so we can remove all the v1 dependencies on tensorflow
- Fixed pytest MPL bug by temporarily setting it to a previous version
- Data collection function
sam.data_sources.read_regenradar
does now acceptbatch_size
and collects data in batches to avoid timeouts.
No changes, version bump only.
No changes, version bump only.
- New class
sam.feature_engineering.BaseFeatureEngineer
to create a default interface for feature engineering transformers. - New class
sam.feature_engineering.FeatureEngineer
to make any feature engineering transformer from a function. - New class
sam.feature_engineering.IdentyEngineer
to make a transformer that only passes data (does nothing). Utility for other features. - New class
sam.feature_engineering.SimpleFeatureEngineer
for creating time series features: rolling features and time components (one-hot or cyclical) - Utility functions
sam.models.utils.remove_target_nan
andsam.models.utils.remove_until_first_value
for removing missings values in training data.
- Replaces
SamQuantileMLP
with newMLPTimeseriesRegressor
, which has more general purpose. Allows to provide any feature engineering transformer / pipeline. Default parameters are changed as well. - New example notebooks and corresponding datasets for new feature engineering and model classes.
- Renaming name of
SPCRegressor
toConstantTimeseriesRegressor
for consistency. AlsoSPCTemplate
was renamed toConstantTemplate
accordingly. - Combination of
use_diff_of_y=True
and providingy_scaler
did not work correctly. Fixed. - Changed deprecated
lr
tolearning_rate
intensorflow.keras.optimizers.Adam
. - All classes now support
get_feature_names_out
instead ofget_feature_names
, which is consistent withscikit-learn>=1.1
. - Updated documentation and new examples for new feature engineering and model classes.
data/rainbow_beach.parquet
provides a new example dataset.
- Fixed the version info for the Sphinx docs
- Moved to pyproject.toml instead of setup.py to make this package more future proof
- Removed deprecated Azure Devops pipelines
- Added
.readthedocs.yml
anddocs/requirements.txt
to include requirements for readthedocs build.
- Updated
CONTRIBUTING.md
for open source / github contribution guidelines - Added
black
to requirements and linting pipeline - All code reformatted with
black
and project configuration
- Revert version changes in
scikit-learn
andtensorflow
due to compatibility issues
decompose_datetime()
now also accepts a timezone argument. This enables the user to use time features in another timezone. For example: If your input data is in UTC, but you're expecting that human behaviour is also important and the model is applied on the Netherlands, you can addEurope/Amsterdam
todecompose_datetime
and it will convert the time from UTC to the correct time, also taking into account daylight savings. This only has an effect on the feature engineering, preprocessing and postprecessing should always happen on UTC dates.- Fixed mypy errors in decompose_datetime.py
- Updated docstring examples in decompose_datetime.py (they work now)
- MIT License added
- Additional information in
setup.py
andsetup.cfg
for license
- Updates package dependencies to no longer use a fixed version, but instead a minimum version
- Changed logging submodule to logging_functions to prevent overwriting logging package
- Fixed some mypy errors
- Added fix for SHAP DeepExplainer: shap/shap#2189
- Fixed some deprecation warnings
pyproject.toml
provides settings for building package (required for PyPI)- Additional information in
setup.py
for open source release
predict
method fromsam.models.ConstantTimeseriesRegressor
now accepts kwargs for compatibility. Now, swapping models withSamQuantileMLP
withforce_monotonic_quantiles
doesn't cause a failure.
sam.models.QuantileMLP
requirespredict_ahead
to be int or list, but always casts to lists. Change to tuples in version 2.6.0, but caused inconsistencies and incorrect if statements.
sam.visualization.sam_quantile_plot
now displays quantiles in 5 decimals, requirement from Aquasuite with larger quantiles.
- New (optional) parameters for
sam.validation.RemoveFlatlines
:backfill
andmargin
- Simplified
sam.validation.RemoveFlatlines
to usepandas.DataFrame.rolling
functions
SamQuantileMLP.predict
now acceptsforce_monotonic_quantiles
to force quantiles to be monotonic using a postprocessing step.
- Added a SPC model to SAM called
ConstantTimeseriesRegressor
, which uses theSamQuantileRegressor
base class and can be used as a fall back or benchmark model
SamQuantileMLP
now accepts Sequence types for some of its init parameters (like quantiles, time_cyclicals etc.) and the default value is changed to tuples to prevent the infamous "Mutable default argument" issue.
- Added a new abstract base class for all SAM models called
SamQuantileRegressor
, that contains some required abstract methods (fit, train, score, dump, load) any subclass needs to implement as well as some default implementations like a standard feature engineer.SamQuantileMLP
is now a subclass of this new abstract base class, new classes will follow soon.
sam.visualization._evaluate_performance
now checks for nan in bothy_hat
andy_true
.
sam.visualization.performance_evaluation_fixed_predict_ahead
acceptsmetric
parameter that indicates what metric to evaluate the performance with: 'R2' or 'MAE' (mean absolute error). Default metric is 'R2'.
- No more bandit linting errors: replace
assert
statements - Remove faulty try-except-pass constructions
- Function
sam.utils.contains_nan
andsam.utils.assert_contains_nan
are added for validation
- Scikit-learn version had to be <0.24.0 for certain features, TODO: update dependencies in the near future
- Updated README, setup.py and CONTRIBUTING in preparation for going open-source.
LinearQuantileRegression
only contains parameters and pvalues, and data is no longer stored in the class. This was unwanted.
LinearQuantileRegression
acceptsfit_intercept
parameter, similar tosklearn.LinearRegression
.
read_knmi_station_data
- Added a with statement to close API connection, which caused errors if used too many times
- Removed all deprecated functions, see next subsection for details. All deprecated tests have been removed as well.
- All docstrings have been checked and (if needed) updated
- Type hinting in all files
- Linting changes:
- Changed pipeline linter to flake8
- Formatted all files in black
- Split large classes and functions to satisfy a maximum cyclomatic complexity of 10
- Moved inline imports to top of file if the packages were already imported by (any) parent
- Sorted imports
- Updated the
README.MD
andCONTRIBUTING.MD
files
sam.data_sources
- Deleted deprecated function
sam.data_sources.create_synthetic_timeseries
- Deleted deprecated function
sam.feature_engineering
- Reduced duplicate code in
sam.feature_engineering.automatic_rolling_engineering
andsam.feature_engineering.decompose_datetime
sam.feature_engineering.automatic_rolling_engineering
: all dataframe inputs must be linearly increasing in time and have a datetime index, if not an AssertionError is raised- Deleted deprecated function
sam.feature_engineering.build_timefeatures
- Moved hardcoded data in
sam.feature_engineering.tests.test_automatic_feature_engineering
to separatetest_data
parent folder
- Reduced duplicate code in
sam.feature_selection
- This subpackage is removed, as it was deprecated
sam.models
- Reduced complexity of
sam.models.SamQuantileMLP
by adding extra internal methods for large methods
- Reduced complexity of
sam.preprocessing
- Removed merge conflict files
sam.preprocessing\tests\test_scaling.py.orig
andsam.preprocessing\data_scaling.py.orig
- Deleted deprecated function
sam.preprocessing.complete_timestamps
- Removed merge conflict files
sam.train_models
- This subpackage is removed, as it was deprecated
sam.utils
- Deleted deprecated functions:
sam.utils.MongoWrapper
,sam.utils.label_dst
,sam.utils.average_winter_time
, andsam.utils.unit_to_seconds
- Added new function
sam.utils.has_strictly_increasing_index
to validate the datetime index of a dataframe
- Deleted deprecated functions:
sam.visualization
- reduced complexity of
sam.visualization.sam_quantile_plot
by splitting the static and interactive plot in separate functions.
- reduced complexity of
sam.data_sources.read_knmi_station_data
was added to get KNMI data for a selection of KNMI station numberssam.data_sources.read_knmi_stations
was added to get all automatic KNMI station meta data
sam.data_sources.read_knmi
was changed because of a new KNMI API. The packageknmy
does not work anymore.knmy
is no longer a (optional) dependency (outdated)
sam.visualization.quantile_plot
acceptsbenchmark
parameter that plots the benchmark used to calculate the model performance
sam.preprocessing.sam_reshape.sam_format_to_wide
now explicitly defines the arguments when callingpd.pivot_table
sam.metrics.r2_calculation.train_r2
can now use an array as a benchmark, not only a scalar average, for r2 calculation
sam.visualization.performance_evaluation_fixed_predict_ahead
acceptstrain_avg_func
parameter that provides a function to calculate the average of the train set to use for r2 calculation (default=np.nanmean)
- Name change:
sam.metrics.train_mean_r2
->sam.metrics.r2_calculation
to avoid circular import errors and the file now contains multiple methods - New function:
sam.metrics.r2_calculation.train_r2
a renamed copy ofsam.metrics.r2_calculation.train_mean_r2
as any average can now be used for r2 calculation
sam.metrics.train_mean_r2
is now deprecated and callssam.metrics.train_r2
sam.data_sources.read_knmi
now accepts parameterpreprocessing
to transform data to more scales.
keras_joint_mae_tilted_loss
: to fit the median in quantile regression (use average_type='median' in SamQuantileMLP)plot_feature_importances
: bar plot of feature importances (e.g. computed in SamQuantileMLP.quantile_feature_importancescompute_quantile_ratios
: to check the proportion of data falling beneath certain quantile
- eli5 uses the sklearn.metrics.scorer module, which is gone in 0.24.0, so we need <=0.24.0
- shap does not work with tensorflow 2.4.0 so we need <=2.3.1
- statsmodels is no longer a dependency (dependency introduced in version 2.0.19)
sam.metrics.tilted_loss
: A tilted loss function that works with numpy / pandassam.models.LinearQuantileRegression
: sklearn style wrapper for quantile regression using statsmodels
sam.models.SamQuantileMLP
: Now stores the input columns (before featurebuilding) which can be accessed byget_input_cols()
sam.validation.flatline
: Now acceptswindow="auto"
option, for which the maximum flatline window is estimated in thefit
method
- New class:
sam.feature_engineering.SPEITransformer
for computing precipitation and evaporation features
- Fixed failing unit tests by removing tensorflow v1 code
- Fixed QuantileMLP, where the target would stay an integer, which fails with our custom loss functions
- Updated optional dependencies to everything we use
- With the latest pandas version a UTC to string conversion has been fixed. Removed our fix, upped the pandas version
- Updated scikit-learn to at least 0.21, which is required for the iterative imputer
- Added
run-linting.yml
to run pycodestyle in devops pipelines - Added
run-unittest.yml
to run pytest in devops pipelines - Removed
.arcconfig
(old arcanist unit test configuration) - Removed
.arclint
(old arcanist lint configuration)
sam.visualisation.sam_quantile_plot
: Options to setoutlier_window
andoutlier_limit
, to only plot anomalies when at leastoutlier_limit
anomalies are counted within theoutlier window
- Bugfix in
sam.metrics.custom_callbacks
sam.models.SamQuantileMLP.score
: if using y_scaler, now scales actual and prediction to equalize score to keras loss
sam.models.SamQuantileMLP.quantile_feature_importances
: now has argument sum_time_components that summarizes feature importances for different features generated for a single component (i.e. in onehot encoding).
sam.feature_engineering.automatic_rolling_engineering
:estimator_type
argument can now also be 'bayeslin', which should be used if one hot components are used
sam.feature_engineering.automatic_rolling_engineering
: constant features are no longer deleted (broke one hot features)
sam.models.SamQuantileMLP
: When using y_scaler, name of rescaled y-series is set correctly.
sam.models.SamQuantileMLP
: Now accepts a keyword argumentr2_callback_report
to add the new custom r2 callback.
sam.metrics.custom_callbacks
: Added a custom callback that computes r2 withsam.metrics.train_mean_r2
for each epoch
sam.validation.create_validation_pipe
: the imputation part is now correctly applied only to thecols
columns in the dfsam.metrics.train_mean_r2
: now only adds non-nan values in np.arrays (previously would return nan R2)
sam.visualization.quantile_plot
: now accepts custom outliers with 'outlier' argument
sam.visualization.quantile_plot
: now correctly shifts y_hat with predict_ahead
- New function:
sam.metrics.train_mean_r2
that evaluates r2 based on the train set mean - New function:
sam.visualization.performance_evaluation_fixed_predict_ahead
that evaluates model performance with certain predict ahead.
sam.feature_engineering.automatic_rolling_engineering
now has new argument 'onehots'. The argument 'add_time_features' is now removed, as 'cyclicals' and 'onehots' now together make up both timefeatures
sam.feature_engineering.decompose_datetime
'components' argument now support 'secondofday'
sam.visualization.quantile_plot
'score' argument changed to 'title' to enhance generalizability
- New function:
sam.visualization.quantile_plot
function creates an (interactive) plot of SamQuantileMLP output
sam.feature_engineering.decompose_datetime
now has an new argument 'onehots' that converts time variables to one-hot-encodedsam.feature_engineering.BuildRollingFeatures
: now as an argument 'add_lookback_to_colname'sam.models.SamQuantileMLP
: now has argument 'time_onehots', default time variables adjusted accordinglysam.models.SamQuantileMLP
: now has argument 'y_scaler'
sam.models.SamQuantileMLP
: setting use_y_as_feature to True would give error if predict ahead was 0.
- New function:
sam.models.create_keras_autoencoder_mlp
function that returns keras MLP for unsupervised anomaly detection - New function:
sam.models.create_keras_autoencoder_rnn
function that returns keras RNN for unsupervised anomaly detection - Change
sam.models.create_keras_quantile_mlp
: supports momentum of 1.0 for no batch normalization. Value of None is still supported. - Change
sam.models.create_keras_quantile.rnn
: supports lower case layer types 'lstm' and 'gru'
A lot changed in version 2.0.0. Only changes compared to 1.0.3 are listed here. For more details about any function, check the documentation.
sam.preprocessing.RecurrentReshaper
transformer to transform 2d to 3d for Recurrent Neural networkssam.preprocessing.scale_train_test
function that scales train and test set and returns fitted scalerssam.validation.RemoveFlatlines
transformer that finds and removes flatlines from datasam.validation.RemoveExtremeValues
transformer that finds and removes extreme valuessam.validation.create_validation_pipe
function that creates sklearn pipeline for data validationsam.preprocessing.make_differenced_target
andsam.preprocessing.inverse_differenced_target
allow for differencing a timeseriessam.models.SamQuantileMLP
standard model for fitting wide-format timeseries data with an MLPsam.models.create_keras_quantile_rnn
function that returns a keras RNN model that can predict means and quantiles- Functions for benchmarking a model on some standard data (in sam format):
sam.models.preprocess_data_for_benchmarking
,sam.models.benchmark_model
,sam.models.plot_score_dicts
,sam.models.benchmark_wrapper
sam.feature_engineering.AutomaticRollingEngineering
transformer that calculates rolling features in a smart way
sam.data_sources.read_knmi
has an option to use a nearby weather station if the closest weather station contains nanssam.exploration.lag_correlation
now accepts a list as thelag
parametersam.visualization.plot_lag_correlation
looks better nowsam.recode_cyclical_features
now explicitly requires maximums and provides them for time features- Added example for SamQuantileMLP at
http://10.2.0.20/sam/examples.html#samquantilemlp-demo
sam.preprocessing.sam_format_to_wide
didn't work on pandas 0.23 and oldersam.exploration.lag_correlation
did not correctly use the correlation method parametersam.metrics.keras_tilted_loss
caused the entire package to crash if tensorflow wasn't installedsam.visualization.plot_incident_heatmap
did not correctly set the y-axissam.feature_engineering.BuildRollingFeatures
threw a deprecationwarning on newer versions of pandas- General fixes to typos and syntax in the documentation
Added new functions: keras_joint_mse_tilted_loss
, create_keras_quantile_mlp
Change decompose_datetime
and recode_cyclical_features
: the remove_original
argument has been deprecated and renamed to remove_categorical
. The original name was wrong, since this parameter never removed the original features, but only the newly created categorical features.
Change decompose_datetime
and recode_cyclical_features
: a new parameter keep_original
has been added. This parameter behaves the same as BuildRollingFeatures
: it is True by default, but can be set to False to keep only the newly created features.
Add new functions: keras_tilted_loss
, keras_rmse
, get_keras_forecasting_metrics
.
Improve read_regenradar
: it now allows optional arguments to be passed directly to the lizard API. Unfortunately, as of now, we still don't have access to lizard API documentation, so usefulness of this new feature is limited.
Change normalize_timestamps
signature and defaults. No UserWarning was given because the previous version was so broken that it needed to be fixed asap
Change correct_outside_range
, correct_below_threshold
, correct_above_threshold
to accept series instead of a dataframe. The old behavior can be recreated: given df
with column TARGET
: The old behavior was df = correct_outside_range(df, 'TARGET')
, equivalent new code is df['TARGET'] = correct_outside_range(df['TARGET'])
.
Change correct_outside_range
, correct_below_threshold
, correct_above_threshold
to ignore missing values completely. Previously, missing values were treated as outside the range.
Added new functions: sam_format_to_wide
, wide_to_sam_format
, FunctionTransformerWithNames
First release.