[ENH] online update capability for probabilistic regressors #462

fkiraly · 2024-09-13T22:25:12Z

Adds framework support for online update capability for probabilistic regressors, and a simple composite strategy that refits on all data, for testing the framework. Closes #463

Contains:

extension of the regressor and survival base class with a potential update / _update method for batch updates
addition of a tag capability:online for respective estimators
addition of a composite OnlineRefit that adds the capability:online tag and refits the regressor upon all data seen so far. This is a separate estimator to avoid that all estimators remember (and clutter self) with the data
a similar composite OnlineDontRefit that turns off online capability
a specific test case for online updates, in TestAllRegressors

fkiraly · 2024-09-13T22:25:45Z

FYI @simon-hirsch, @BerriJ - this extends the framework to add online methods :-)

simon-hirsch · 2024-09-19T16:30:59Z

Looks generally quite cool to me 👍

Generally, I think putting the "remember old data and fit on the union of new data and old data" strategy in a separate estimator is a good thing, as it potentially dangerous wrt to the disc space an estimator saved with pickle / joblib / ... takes up and will slow down the storing/loading of models.

For testing, you'd might want to use a TimeSeriesSplit instead of the random test_train_split. For exact online learning methods one could even test whether the update leads indeed to the repeated batch fit, this is of course more tricky for approximate methods like SGD.

fkiraly · 2024-09-19T18:29:36Z

For testing, you'd might want to use a TimeSeriesSplit instead of the random test_train_split.

This is just testing the interface, and it should not matter imo for the test.

Regarding the "conceptual model", unlike sklearn we do not assume or test that regressors have an exchangable behaviour with respect to sample index. Once we get the first examples of regressors that assume ordering or other types of non-exchangeability, we could simply distinguish them by tag.

fkiraly · 2024-09-19T18:30:25Z

Generally, I think putting the "remember old data and fit on the union of new data and old data" strategy in a separate estimator is a good thing, as it potentially dangerous wrt to the disc space an estimator saved with pickle / joblib / ... takes up and will slow down the storing/loading of models.

Agreed, I think it is an issue with sktime forecasters already - there, storing seems more important since there's no "y from X" pairing, though I'd also like to get rid of it as much as possible.

fkiraly added 7 commits September 13, 2024 22:56

base

6ac8511

delegate

1b3e443

pipe

ace2fb2

OnlineRefit

5e63fb2

docs

400e449

test

bf4e01f

Update test_all_regressors.py

ccfd3e2

fkiraly added enhancement module:regression probabilistic regression module implementing algorithms Implementing algorithms, estimators, objects native to skpro implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality labels Sep 13, 2024

Update _delegate.py

2c5c01e

fkiraly mentioned this pull request Sep 13, 2024

integration/listing with skpro and sktime? simon-hirsch/rolch#1

Open

fkiraly added 8 commits September 13, 2024 23:29

docstrings

cd0f614

Update _refit.py

c5db01f

Update test_all_regressors.py

76007ae

Update _refit.py

3f1c2ba

discard instead of raise

d64f2ad

Update base.py

fca8ef1

discard instead of raise

8275d72

Update test_all_regressors.py

871b711

fkiraly added 2 commits September 27, 2024 16:23

dont refit

c3d9fa1

Update _dont_refit.py

b3cc2d8

fkiraly merged commit ba2aae5 into main Sep 27, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] online update capability for probabilistic regressors #462

[ENH] online update capability for probabilistic regressors #462

fkiraly commented Sep 13, 2024 •

edited

Loading

fkiraly commented Sep 13, 2024 •

edited

Loading

simon-hirsch commented Sep 19, 2024

fkiraly commented Sep 19, 2024 •

edited

Loading

fkiraly commented Sep 19, 2024 •

edited

Loading

[ENH] online update capability for probabilistic regressors #462

[ENH] online update capability for probabilistic regressors #462

Conversation

fkiraly commented Sep 13, 2024 • edited Loading

fkiraly commented Sep 13, 2024 • edited Loading

simon-hirsch commented Sep 19, 2024

fkiraly commented Sep 19, 2024 • edited Loading

fkiraly commented Sep 19, 2024 • edited Loading

fkiraly commented Sep 13, 2024 •

edited

Loading

fkiraly commented Sep 13, 2024 •

edited

Loading

fkiraly commented Sep 19, 2024 •

edited

Loading

fkiraly commented Sep 19, 2024 •

edited

Loading