Implementation of basic PySpark data preprocessing methods #13

xandaau · 2023-01-15T16:20:44Z

For the tasks of preprocessing pandas data and speeding up experiments, we have the Preprocessor class and a number of base classes with single functionality at preprocessing.
These methods should be implemented for spark dataframes, in the same paradigm as we have for the Designer and the Splitter.

At this moment, the implementation of the following methods is essential:

Aggregation
Outliers removal (robust)
CUPED

The text was updated successfully, but these errors were encountered:

xandaau · 2023-01-31T15:44:52Z

Still did not take into account the possibility of PySpark functionality implementation in the architecture of the added preprocessing classes in #22

xandaau added the enhancement New feature or request label Jan 15, 2023

xandaau mentioned this issue Mar 21, 2024

PySpark support for Cuped class #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of basic PySpark data preprocessing methods #13

Implementation of basic PySpark data preprocessing methods #13

xandaau commented Jan 15, 2023

xandaau commented Jan 31, 2023 •

edited

Loading

Implementation of basic PySpark data preprocessing methods #13

Implementation of basic PySpark data preprocessing methods #13

Comments

xandaau commented Jan 15, 2023

xandaau commented Jan 31, 2023 • edited Loading

xandaau commented Jan 31, 2023 •

edited

Loading