Skip to content

Commit 57566ef

Browse files
Merge branch 'master' of github.com:4paradigm/AutoX
2 parents c30ab25 + 2d7a3f4 commit 57566ef

File tree

2 files changed

+300
-159
lines changed

2 files changed

+300
-159
lines changed

README_EN.md

+17-159
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
English | [简体中文](./README.md)
2-
2+
<img src="./img/logo.png" width = "1500" alt="logo" align=center />
33
# What-is-AutoX?
44
AutoX is an efficient AutoML tool, and it is designed for the tabular data modelling for real-world datasets.
55
Its features include:
@@ -8,35 +8,29 @@ Its features include:
88
- Generic & Universal: Supporting tabular data, including binary classification, multi-class classification and regression problems.
99
- Auto: Fully automated pipeline without human-intervention.
1010
- Out of the box: Providing flexible modules which can be used alone.
11-
- Summary of magics: Organize and publish magics of competitions.
12-
13-
## interpretable-ml
14-
AutoX covers following interpretable machine learning methods:
15-
### Golbel interpretation
16-
- [tree-based model](autox/autox_interpreter/interpreter_demo/global_interpretation/global_surrogate_tree_demo.ipynb)
17-
18-
### Local interpretation
19-
- [LIME](autox/autox_interpreter/interpreter_demo/local_interpretation/lime_demo.ipynb)
20-
- [SHAP](autox/autox_interpreter/interpreter_demo/local_interpretation/shap_demo.ipynb)
11+
- Summary of magics: Organize and publish magics of competitions.
2112

22-
### Influential interpretation
23-
- [nn](autox/autox_interpreter/interpreter_demo/influential_instances/influential_interpretation_nn.ipynb)
24-
- [nn_sgd](autox/autox_interpreter/interpreter_demo/influential_instances/influential_interpretation_nn_sgd.ipynb)
13+
# What-does-AutoX-contain?
14+
- autox_competition: mainly for tabular table data mining competitions
15+
- autox_server: automl service for online deployment
16+
- autox_interpreter: machine learning interpretable function
2517

26-
### Prototypes and Criticisms
27-
- [MMD-critic](autox/autox_interpreter/interpreter_demo/prototypes_and_criticisms/MMD_demo.ipynb)
28-
- [ProtoDash algorithm](autox/autox_interpreter/interpreter_demo/prototypes_and_criticisms/ProtodashExplainer.ipynb)
18+
# Join-the-community
19+
<img src="./img/qr_code_0429.png" width = "200" height = "200" alt="AutoX Community" align=center />
2920

21+
# How-to-contribute-for-AutoX
22+
[how to contribute](./how_to_contribute.md)
3023

3124
# Table-of-Contents
3225
<!-- TOC -->
3326

3427
- [What is AutoX?](#What-is-AutoX?)
28+
- [What does AutoX contain?](#What-does-AutoX-contain?)
29+
- [Join-the-community](#Join-the-community)
30+
- [How to contribute for AutoX](#How-to-contribute-for-AutoX)
3531
- [Table of Contents](#Table-of-Contents)
3632
- [Installation](#Installation)
37-
- [Architecture](#Architecture)
3833
- [Quick Start](#Quick-Start)
39-
- [Summary of Magics](#Summary-of-Magics)
4034
- [Evaluation](#Evaluation)
4135

4236
<!-- /TOC -->
@@ -47,38 +41,11 @@ AutoX covers following interpretable machine learning methods:
4741
3. python setup.py install
4842
```
4943

50-
# Architecture
51-
```
52-
├── autox
53-
│   ├── ensemble
54-
│   ├── feature_engineer
55-
│   ├── feature_selection
56-
│   ├── file_io
57-
│   ├── join_tables
58-
│   ├── metrics
59-
│   ├── models
60-
│   ├── process_data
61-
│   └── util.py
62-
│   ├── CONST.py
63-
│   ├── autox.py
64-
├── run_oneclick.py
65-
└── demo
66-
└── test
67-
├── setup.py
68-
├── README.md
69-
```
70-
7144
# Quick-Start
72-
- Full-Automl
73-
```
74-
from autox import AutoX
75-
path = data_dir
76-
autox = AutoX(target = 'loss', train_name = 'train.csv', test_name = 'test.csv',
77-
id = ['id'], path = path)
78-
sub = autox.get_submit()
79-
sub.to_csv("submission.csv", index = False)
80-
```
81-
- Semi-Automl: run_demo.ipynb
45+
- [autox competition](autox/autox_competition/README_EN.md)
46+
- [autox server](autox/autox_server/README_EN.md)
47+
- [autox interpreter](autox/autox_interpreter/README_EN.md)
48+
8249

8350
# Evaluation
8451
| index |data_type | data_name(link) | metric | AutoX | AutoGluon | H2o |
@@ -87,112 +54,3 @@ sub.to_csv("submission.csv", index = False)
8754
| 2 |regression | [Tabular Playground Series - Aug 2021](https://www.kaggle.com/c/tabular-playground-series-aug-2021) | rmse | 7.87731 | 10.3944 | 7.8895|
8855
| 3 |regression | [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/) | rmse | 0.13043 | 0.13104 | 0.13161 |
8956
| 4 |binary classification | [Titanic](https://www.kaggle.com/c/titanic/) | accuracy | 0.77751 | 0.78229 | 0.79186 |
90-
91-
# Data type
92-
- cat: Categorical, Categorical variable without order.
93-
- ord: Ordinal, Categorical variable with order.
94-
- num: Numeric, Numeric variable.
95-
- datetime: Time variable with Datetime format.
96-
- timestamp: Time variable with Timestamp format.
97-
98-
# Pipeline
99-
- 1.Initialize AutoX
100-
```
101-
1.1 Read data
102-
1.2 Concat train and test
103-
1.3 Identify columns type in data
104-
1.4 Data preprocess
105-
```
106-
- 2.Feature engineering
107-
```
108-
Every feature engineer class inclues the following features:
109-
1. auto select columns which will be executed with current operation
110-
2. review the selected columns
111-
3. modify the columns
112-
4. execute the operation, and return features whose samples' number and order are consistent with orginal table.
113-
```
114-
- 3. Features combination
115-
```
116-
Combine the raw features and derived features, and return wide table.
117-
```
118-
- 4. Data split
119-
```
120-
Split the table into train and test.
121-
```
122-
- 5.Features filter
123-
```
124-
Filter the features according to the distribution of train and test.
125-
```
126-
- 6.Model training
127-
```
128-
Inputs of models are filtered features.
129-
model class inclues the following features:
130-
1. get the default parameters
131-
2. model training
132-
3. parameters tuning
133-
4. get the features importance
134-
5. prediction
135-
```
136-
- 7.Prediction
137-
138-
# AutoX
139-
## Attributes
140-
### info_: Information about the data set.
141-
- info_['id']: List, unique keys to identify the sample.
142-
- info_['target']: String, label column.
143-
- info_['shape_of_train']: Int, the number of samples in the train set.
144-
- info_['shape_of_test']: Int, the number of samples in the test set.
145-
- info_['feature_type']: Dict of Dict, data type of the features.
146-
- info_['train_name']: String, the table name of main table of train.
147-
- info_['test_name']: String, the table name of main table of test.
148-
149-
### dfs_: dfs_ contains all DataFrames, including raw tables and derived tables.
150-
- dfs_['train_test']: The combined data of train data and test data.
151-
- dfs_['FE_feature_name']: Derived tables by feature engineering, such as FE_count, FE_groupby.
152-
- dfs_['FE_all']: The merged table which contains raw tables and derived tables.
153-
154-
## Methods
155-
- concat_train_test: concat the train and test data.
156-
- split_train_test: split train and test data.
157-
- get_submit: get the submission.
158-
159-
# Details of operations in the pipeline:
160-
## Data IO
161-
```
162-
```
163-
164-
## Data Pre-process
165-
```
166-
- extract year, month, day, hour, weekday info from time columns
167-
- delete invalid(nunique equal to 1) features
168-
- delete invalid (label is nan) samples
169-
```
170-
171-
## Feature Engineer
172-
173-
- count feature
174-
```
175-
```
176-
177-
- target encoding feature
178-
179-
180-
- shift feature
181-
```
182-
```
183-
184-
## Model Fitting
185-
```
186-
AutoX supports fellowing models:
187-
1. Lightgbm
188-
2. Xgboost
189-
3. Tabnet
190-
```
191-
192-
## Ensemble
193-
```
194-
AutoX supports two ensemble methods(Bagging will be used in default).
195-
1. Stacking;
196-
2. Bagging。
197-
```
198-

0 commit comments

Comments
 (0)