Skip to content

sdsc-bw/DataFactory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataFactory

This Github is mainly used to introduce some commonly used methods, as well as some feature engineering methods independently researched and developed by SDSC staff. It unites and unifies methods from different packages like imblearn, sklearn, hyperopt and tsai. The common dataminig process and how to use the DataFactory for this is shown in our demos.

Run (Temporary, to be removed)

Go to the root directory and use the following code to create the test report: python usersry_01_01_dash.py --datapath=./data/dataset_31_credit-g.csv --outputpath=./results/

Preprocessing

We offer methods for data preprocessing. This includes label encoding, data balancing, sampling and dealing with NA values and outliers.

Feature Engineering

In addition to that, we provide functions for feature engineering. This includes unary, binary and multiple transformations.

Finetuning

We also provide a finetuning method based on hyperopt.

Here is a complete list of our supported models for time series:

Model String Classification Regression Forecasting Hyperparameters
Decision Tree decision_tree ✔️ ✔️ C: see R: see
Random Forest random_forest ✔️ ✔️ C: see R: see
AdaBoost ada_boost ✔️ ✔️ C: see R: see
KNN knn ✔️ ✔️ C: see R: see
GBDT gbdt ✔️ ✔️ C: see R: see
Gaussian NB gaussian_nb ✔️ see
SVM svm ✔️ ✔️ C: see R: see
Bayesian Ridge bayesian ✔️ see
LSTM lstm ✔️ ✔️ ✔️ see
GRU gru ✔️ ✔️ ✔️ see
MLP mlp ✔️ ✔️ ✔️ see
FCN fcn ✔️ ✔️ ✔️ see
ResNet res_net ✔️ ✔️ ✔️ see
LSTM-FCN lstm_fcn ✔️ ✔️ ✔️ see
GRU-FCN gru_fcn ✔️ ✔️ ✔️ see
mWDN mwdn ✔️ ✔️ ✔️ see
TCN tcn ✔️ ✔️ ✔️ see
MLSTM-FCN mlstm_fcn ✔️ ✔️ ✔️ see
InceptionTime inception_time ✔️ ✔️ ✔️ see
InceptionTimePlus inception_time_plus ✔️ ✔️ ✔️ see
XcetptionTime xception_time ✔️ ✔️ ✔️ see
ResCNN res_cnn ✔️ ✔️ ✔️ see
TabModel tab_model ✔️ ✔️ ✔️ see
OmniScale omni_scale ✔️ ✔️ ✔️ see
TST tst ✔️ ✔️ ✔️ see
XCM xcm ✔️ ✔️ ✔️ see

(C: Classifiction, R: Regression, F: Forecasting)

Here is a complete list of our supported models for computer vision:

Model String Classification Hyperparameters
Decision Tree decision_tree ✔️ see
Random Forest random_forest ✔️ see
AdaBoost ada_boost ✔️ see
KNN knn ✔️ see
GBDT gbdt ✔️ see
Gaussian NB gaussian_nb ✔️ see
SVM svm ✔️ see
ResNet/ResNeta res_net ✔️ see/see
SEResNet se_res_net ✔️ see
ResNeXt res_next ✔️ see
AlexNet alex_net ✔️ see
VGG vgg ✔️ see
EfficientNet efficient_net ✔️ see
WRN wrn ✔️ see
RegNet reg_net ✔️ see
SCNet sc_net ✔️ see
PANSNet pnas_net ✔️ see