Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of basic PySpark data preprocessing methods #13

Open
xandaau opened this issue Jan 15, 2023 · 1 comment
Open

Implementation of basic PySpark data preprocessing methods #13

xandaau opened this issue Jan 15, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@xandaau
Copy link
Collaborator

xandaau commented Jan 15, 2023

For the tasks of preprocessing pandas data and speeding up experiments, we have the Preprocessor class and a number of base classes with single functionality at preprocessing.
These methods should be implemented for spark dataframes, in the same paradigm as we have for the Designer and the Splitter.

At this moment, the implementation of the following methods is essential:

  1. Aggregation
  2. Outliers removal (robust)
  3. CUPED
@xandaau xandaau added the enhancement New feature or request label Jan 15, 2023
@xandaau
Copy link
Collaborator Author

xandaau commented Jan 31, 2023

Still did not take into account the possibility of PySpark functionality implementation in the architecture of the added preprocessing classes in #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant