Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Many Feature-extraction improvement #115

Closed
MiguelGuima opened this issue Mar 16, 2021 · 2 comments
Closed

[BUG] Many Feature-extraction improvement #115

MiguelGuima opened this issue Mar 16, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@MiguelGuima
Copy link

Good morning.

In an investigation project, we are developing an approach to predict the error of future models in an streaming data scenario, in wich data is allways coming and we need to decide when to retrain a model to not become obsolete.

For that purpose, we are extracting as many features we can from pymfe package (thanks so much for your work and continue doing such a beautifull work), and we come with an idea to use as many datasets we also can for the meta-feature extraction in order to maybe predict the error of any kind of dataset, instead of only one dataset containing only one problem/domain.

So, in the present work we are investigating this possibility with regression models for now, but we are asking for help in the sense that we would like to extract as many meta-features we can (we are aware of dimensionality problem and we apply strategies latter) and ignore the ones that can't be extracted like Fig.a, in some automated way instead of selecting some specific meta-features or groups, if that would be possible.

For example to avoid the Fig.b error in the extraction of the appended dataset, with mfe = MFE(groups="all", summary="all")

Fig.a
Fig.a
Fig.b
Fig.b
Dataset
2019.zip

Thanks again for your excelent work!

@MiguelGuima MiguelGuima added the bug Something isn't working label Mar 16, 2021
@FelSiq
Copy link
Collaborator

FelSiq commented Mar 16, 2021

Hi @MiguelGuima,

It seems your data consists of regression problems, correct? If so, the pymfe does not currently support regression tasks, only classification problems, so it is expected bugs in the meta-feature extraction process.

Alternatives could be to either 1) discretize the dependent attribute y manually before fitting into the pymfe extractor, or 2) use the beta version of ts-pymfe which is intended for time-series and may also work with data streams.

Possibly related issue: #98.

@MiguelGuima
Copy link
Author

Thanks for the quick response!

I didn't see that pymfe don't work with regression problems...

I am sure I will find a way around :)

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants