1
1
English | [ 简体中文] ( ./README.md )
2
-
2
+ < img src = " ./img/logo.png " width = " 1500 " alt = " logo " align = center />
3
3
# What-is-AutoX?
4
4
AutoX is an efficient AutoML tool, and it is designed for the tabular data modelling for real-world datasets.
5
5
Its features include:
@@ -8,35 +8,29 @@ Its features include:
8
8
- Generic & Universal: Supporting tabular data, including binary classification, multi-class classification and regression problems.
9
9
- Auto: Fully automated pipeline without human-intervention.
10
10
- Out of the box: Providing flexible modules which can be used alone.
11
- - Summary of magics: Organize and publish magics of competitions.
12
-
13
- ## interpretable-ml
14
- AutoX covers following interpretable machine learning methods:
15
- ### Golbel interpretation
16
- - [ tree-based model] ( autox/autox_interpreter/interpreter_demo/global_interpretation/global_surrogate_tree_demo.ipynb )
17
-
18
- ### Local interpretation
19
- - [ LIME] ( autox/autox_interpreter/interpreter_demo/local_interpretation/lime_demo.ipynb )
20
- - [ SHAP] ( autox/autox_interpreter/interpreter_demo/local_interpretation/shap_demo.ipynb )
11
+ - Summary of magics: Organize and publish magics of competitions.
21
12
22
- ### Influential interpretation
23
- - [ nn] ( autox/autox_interpreter/interpreter_demo/influential_instances/influential_interpretation_nn.ipynb )
24
- - [ nn_sgd] ( autox/autox_interpreter/interpreter_demo/influential_instances/influential_interpretation_nn_sgd.ipynb )
13
+ # What-does-AutoX-contain?
14
+ - autox_competition: mainly for tabular table data mining competitions
15
+ - autox_server: automl service for online deployment
16
+ - autox_interpreter: machine learning interpretable function
25
17
26
- ### Prototypes and Criticisms
27
- - [ MMD-critic] ( autox/autox_interpreter/interpreter_demo/prototypes_and_criticisms/MMD_demo.ipynb )
28
- - [ ProtoDash algorithm] ( autox/autox_interpreter/interpreter_demo/prototypes_and_criticisms/ProtodashExplainer.ipynb )
18
+ # Join-the-community
19
+ <img src =" ./img/qr_code_0429.png " width = " 200 " height = " 200 " alt =" AutoX Community " align =center />
29
20
21
+ # How-to-contribute-for-AutoX
22
+ [ how to contribute] ( ./how_to_contribute.md )
30
23
31
24
# Table-of-Contents
32
25
<!-- TOC -->
33
26
34
27
- [ What is AutoX?] ( #What-is-AutoX? )
28
+ - [ What does AutoX contain?] ( #What-does-AutoX-contain? )
29
+ - [ Join-the-community] ( #Join-the-community )
30
+ - [ How to contribute for AutoX] ( #How-to-contribute-for-AutoX )
35
31
- [ Table of Contents] ( #Table-of-Contents )
36
32
- [ Installation] ( #Installation )
37
- - [ Architecture] ( #Architecture )
38
33
- [ Quick Start] ( #Quick-Start )
39
- - [ Summary of Magics] ( #Summary-of-Magics )
40
34
- [ Evaluation] ( #Evaluation )
41
35
42
36
<!-- /TOC -->
@@ -47,38 +41,11 @@ AutoX covers following interpretable machine learning methods:
47
41
3. python setup.py install
48
42
```
49
43
50
- # Architecture
51
- ```
52
- ├── autox
53
- │ ├── ensemble
54
- │ ├── feature_engineer
55
- │ ├── feature_selection
56
- │ ├── file_io
57
- │ ├── join_tables
58
- │ ├── metrics
59
- │ ├── models
60
- │ ├── process_data
61
- │ └── util.py
62
- │ ├── CONST.py
63
- │ ├── autox.py
64
- ├── run_oneclick.py
65
- └── demo
66
- └── test
67
- ├── setup.py
68
- ├── README.md
69
- ```
70
-
71
44
# Quick-Start
72
- - Full-Automl
73
- ```
74
- from autox import AutoX
75
- path = data_dir
76
- autox = AutoX(target = 'loss', train_name = 'train.csv', test_name = 'test.csv',
77
- id = ['id'], path = path)
78
- sub = autox.get_submit()
79
- sub.to_csv("submission.csv", index = False)
80
- ```
81
- - Semi-Automl: run_demo.ipynb
45
+ - [ autox competition] ( autox/autox_competition/README_EN.md )
46
+ - [ autox server] ( autox/autox_server/README_EN.md )
47
+ - [ autox interpreter] ( autox/autox_interpreter/README_EN.md )
48
+
82
49
83
50
# Evaluation
84
51
| index | data_type | data_name(link) | metric | AutoX | AutoGluon | H2o |
@@ -87,112 +54,3 @@ sub.to_csv("submission.csv", index = False)
87
54
| 2 | regression | [ Tabular Playground Series - Aug 2021] ( https://www.kaggle.com/c/tabular-playground-series-aug-2021 ) | rmse | 7.87731 | 10.3944 | 7.8895|
88
55
| 3 | regression | [ House Prices] ( https://www.kaggle.com/c/house-prices-advanced-regression-techniques/ ) | rmse | 0.13043 | 0.13104 | 0.13161 |
89
56
| 4 | binary classification | [ Titanic] ( https://www.kaggle.com/c/titanic/ ) | accuracy | 0.77751 | 0.78229 | 0.79186 |
90
-
91
- # Data type
92
- - cat: Categorical, Categorical variable without order.
93
- - ord: Ordinal, Categorical variable with order.
94
- - num: Numeric, Numeric variable.
95
- - datetime: Time variable with Datetime format.
96
- - timestamp: Time variable with Timestamp format.
97
-
98
- # Pipeline
99
- - 1.Initialize AutoX
100
- ```
101
- 1.1 Read data
102
- 1.2 Concat train and test
103
- 1.3 Identify columns type in data
104
- 1.4 Data preprocess
105
- ```
106
- - 2.Feature engineering
107
- ```
108
- Every feature engineer class inclues the following features:
109
- 1. auto select columns which will be executed with current operation
110
- 2. review the selected columns
111
- 3. modify the columns
112
- 4. execute the operation, and return features whose samples' number and order are consistent with orginal table.
113
- ```
114
- - 3 . Features combination
115
- ```
116
- Combine the raw features and derived features, and return wide table.
117
- ```
118
- - 4 . Data split
119
- ```
120
- Split the table into train and test.
121
- ```
122
- - 5.Features filter
123
- ```
124
- Filter the features according to the distribution of train and test.
125
- ```
126
- - 6.Model training
127
- ```
128
- Inputs of models are filtered features.
129
- model class inclues the following features:
130
- 1. get the default parameters
131
- 2. model training
132
- 3. parameters tuning
133
- 4. get the features importance
134
- 5. prediction
135
- ```
136
- - 7.Prediction
137
-
138
- # AutoX
139
- ## Attributes
140
- ### info_ : Information about the data set.
141
- - info_ [ 'id'] : List, unique keys to identify the sample.
142
- - info_ [ 'target'] : String, label column.
143
- - info_ [ 'shape_of_train'] : Int, the number of samples in the train set.
144
- - info_ [ 'shape_of_test'] : Int, the number of samples in the test set.
145
- - info_ [ 'feature_type'] : Dict of Dict, data type of the features.
146
- - info_ [ 'train_name'] : String, the table name of main table of train.
147
- - info_ [ 'test_name'] : String, the table name of main table of test.
148
-
149
- ### dfs_ : dfs_ contains all DataFrames, including raw tables and derived tables.
150
- - dfs_ [ 'train_test'] : The combined data of train data and test data.
151
- - dfs_ [ 'FE_feature_name'] : Derived tables by feature engineering, such as FE_count, FE_groupby.
152
- - dfs_ [ 'FE_all'] : The merged table which contains raw tables and derived tables.
153
-
154
- ## Methods
155
- - concat_train_test: concat the train and test data.
156
- - split_train_test: split train and test data.
157
- - get_submit: get the submission.
158
-
159
- # Details of operations in the pipeline:
160
- ## Data IO
161
- ```
162
- ```
163
-
164
- ## Data Pre-process
165
- ```
166
- - extract year, month, day, hour, weekday info from time columns
167
- - delete invalid(nunique equal to 1) features
168
- - delete invalid (label is nan) samples
169
- ```
170
-
171
- ## Feature Engineer
172
-
173
- - count feature
174
- ```
175
- ```
176
-
177
- - target encoding feature
178
-
179
-
180
- - shift feature
181
- ```
182
- ```
183
-
184
- ## Model Fitting
185
- ```
186
- AutoX supports fellowing models:
187
- 1. Lightgbm
188
- 2. Xgboost
189
- 3. Tabnet
190
- ```
191
-
192
- ## Ensemble
193
- ```
194
- AutoX supports two ensemble methods(Bagging will be used in default).
195
- 1. Stacking;
196
- 2. Bagging。
197
- ```
198
-
0 commit comments