"Toward Talent Scientist: Sharing and Learning Together" --- Jingwei Too
- This toolbox contains 6 widely used machine learning algorithms
- The
Demo_KNN
andDemo_LDA
provide the examples of how to use these methods on benchmark dataset
You may switch the algorithm by changing the knn
in from ML.knn import jkfold
to other abbreviations
- If you wish to use linear discriminate analysis ( LDA ) classifier then you may write
from ML.lda import jkfold
- If you want to use naive bayes ( NB ) classifier then you may write
from ML.nb import jkfold
feat
: feature vector matrix ( Instance x Features )label
: label matrix ( Instance x 1 )opts
: parameter settingsho
: ratio of testing data in hold-out validationkfold
: number of folds in k-fold cross-validation
mdl
: Machine learning model ( It contains several results )acc
: classification accuracycon
: confusion matrixr
: precision and recall
There are three types of performance validations. These validation strategies are listed as following ( KNN is adopted as an example ).
- Hold-out cross-validation
from ML.knn import jho
- K-fold cross-validation
from ML.knn import jkfold
- Leave-one-out cross-validation
from ML.knn import jloo
import numpy as np
import pandas as pd
# change this to switch algorithm & types of validation (jho, jkfold, jloo)
from ML.knn import jkfold
import matplotlib.pyplot as plt
import seaborn as sns
# load data
data = pd.read_csv('ionosphere.csv')
data = data.values
feat = np.asarray(data[:, 0:-1])
label = np.asarray(data[:, -1])
# parameters
k = 5
kfold = 10
opts = {'k':k, 'kfold':kfold}
# KNN with k-fold
mdl = jkfold(feat, label, opts)
# overall accuracy
accuracy = mdl['acc']
# confusion matrix
confmat = mdl['con']
print(confmat)
# precision & recall
result = mdl['r']
print(result)
# plot confusion matrix
uni = np.unique(label)
# Normalise
con = confmat.astype('float') / confmat.sum(axis=1)[:, np.newaxis]
fig, ax = plt.subplots()
sns.heatmap(con, annot=True, fmt='.2f', xticklabels=uni, yticklabels=uni, cmap="YlGnBu")
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('KNN')
plt.show()
import numpy as np
from sklearn import datasets
# change this to switch algorithm & types of validation (jho, jkfold, jloo)
from ML.svm import jho
import matplotlib.pyplot as plt
import seaborn as sns
iris = datasets.load_iris()
feat = iris.data
label = iris.target
# parameters
ho = 0.3 # 30% testing set
kernel = 'rbf'
opts = {'ho':ho, 'kernel':kernel}
# machine learning
mdl = jho(feat, label, opts)
# overall accuracy
accuracy = mdl['acc']
# confusion matrix
confmat = mdl['con']
print(confmat)
# precision & recall
result = mdl['r']
print(result)
# plot confusion matrix
uni = np.unique(label)
# Normalise
con = confmat.astype('float') / confmat.sum(axis=1)[:, np.newaxis]
fig, ax = plt.subplots()
sns.heatmap(con, annot=True, fmt='.2f', xticklabels=uni, yticklabels=uni, cmap="YlGnBu")
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('SVM')
plt.show()
import numpy as np
from sklearn import datasets
# change this to switch algorithm & types of validation (jho, jkfold, jloo)
from ML.lda import jloo
import matplotlib.pyplot as plt
import seaborn as sns
iris = datasets.load_iris()
feat = iris.data
label = iris.target
# parameters
opts = {}
# machine learning
mdl = jloo(feat, label, opts)
# overall accuracy
accuracy = mdl['acc']
# confusion matrix
confmat = mdl['con']
print(confmat)
# precision & recall
result = mdl['r']
print(result)
# plot confusion matrix
uni = np.unique(label)
# Normalise
con = confmat.astype('float') / confmat.sum(axis=1)[:, np.newaxis]
fig, ax = plt.subplots()
sns.heatmap(con, annot=True, fmt='.2f', xticklabels=uni, yticklabels=uni, cmap="YlGnBu")
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('LDA')
plt.show()
- Python 3
- Numpy
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn
- Click on the name of algorithm to check the parameters
- Use the
opts
to set the specific parameters - If you do not set extra parameters then the algorithm will use default setting in here
No. | Abbreviation | Name | Support |
---|---|---|---|
06 | knn |
K-nearest Neighbor | Multi-class |
05 | svm |
Support Vector Machine | Multi-class |
04 | dt |
Decision Tree | Multi-class |
03 | lda |
Linear Discriminate Analysis | Multi-class |
02 | nb |
Naive Bayes | Multi-class |
01 | rf |
Random Forest | Multi-class |