Step through each feature to determine what preprocessing should be done. Then automatically create and fit a pipeline. Processing pipeline is based on sklearn and includes transformers from the tubular package
TO-DO:
simplify class to not retain a copy of the dataframeDone 5/12/2023. Now user has to pass in dataframe as a parameter (makes saving the preprocessor in a pickle file less bloated)allow for user to specify cappingUpdated 5/12/2023. User can go through each specific feature to see what different capping values, or select quantiles for all- update response variable to be used for categorical encoding (one-hot, ordinal encoding, etc.)
- Create a version that only uses sklearn preprocessing methods (in case tubular is unavailable)