Skip to content

shubha3/Image-Classification-ML-Project

Repository files navigation

A Comparative Study of Learning Algorithms for Image Classification

Arkonil Dhar       Saumyadip Bhowmick       Shreya Pramanik
Shubha Sankar Banerjee         Souvik Bhattacharyya

M.Sc. Statistics
Indian Institute of Technology Kanpur


Duration- July, 2021


Data Description

The MNIST dataset (Modified National Institute of Standards and Technology) is a large database of handwritten digits that has been chosen as our intended dataset for testing different Image Classification Techniques.

The data is present is present as MATLAB files which needs to be uploaded to the Jupiter notebook in due process.

The data is in two sets: Unrotated and Rotated.

The unrotated dataset as the name suggests consists of observation where the digits are not upright from the perspective of the viewer. unrotated image

The rotated dataset on the other hand consists of observations where the digits have been rotated clockwise or anti-clockwise.

rotated image

Methodology

We have tried to check the efficacies of a variety of learning algorithms to judge their usability on this particular dataset.

In particular we have used the following learning algorithms:

  1. Discriminant Analysis
    • Linear Discriminant Analysis
    • Quadratic Discriminant Analysis
  2. Decision Trees
    • Application of Cost Complexity Pruning
    • Random Forest
  3. Support Vector Machine
  4. Neural Network
  5. Convolutional Neural Network

In this project, we have first applied the above said learning algorithms to predict the labels of the observations both on unrotated and merged (unrotated+rotated) data.

Brief Summary


Discriminant Analysis

  • Trained the Linear and Quadratic Discriminant models on the unrotated training dataset.
  • Tried to predict the labels of the observations in the unrotated test dataset.
  • Tried to predict the labels of the observations in the rotated test dataset.
  • Applied Principal Component Analysis (PCA) as a means of dimension reduction and then tried to repeat the above steps using the principal components as observations.
  • Merged the training sets of rotated and unrotated datasets.
  • Trained LDA and QDA models on this merged dataset.
  • Tried to predict the labels of the merged test dataset.
  • Used PCA on this merged dataset to reduce the dimension of the feature space and tried to repeat the above steps using the principal components as observations.

Decision Tree

  • Applied Decision Tree Algorithm on the original dataset with and without reduced dimension via applying PCA.
  • Applied Decision Tree with Cost Complexity Pruning on the same dataset.
  • Trained a Random Forest Model on the unrotated dataset.
  • Tested the above three models on the rotated test dataset.
  • Merged the training sets of scaled rotated and unrotated datasets.
  • Applied Decision Tree Algorithm on the merged dataset with and without Cost Complexity Pruning.
  • Tested the above models on the merged test set.
  • Trained a Random Forest Model on the merged dataset tested it on the merged test set.

Support Vector Machine

  • Trained SVM model on the unrotated training dataset using RBF Kernel.
  • Tried to predict the labels of the observations in the unrotated test dataset.
  • Tried to predict the labels of the observations in the rotated test dataset.
  • Applied Principal Component Analysis (PCA) as a means of dimension reduction and then tried to repeat the above steps using the principal components as observations.
  • Merged the training sets of scaled rotated and unrotated datasets.
  • Trained SVM model on this merged dataset using RBF Kernel.
  • Tried to predict the labels using the merged test dataset.
  • Used PCA on this merged dataset to reduce the dimension of the feature space and tried to repeat the above steps using the principal components as observations.

  • Trained Neural Network model on the unrotated training dataset with varying model architectures.
  • Selected the model yielding best validation accuracy and tried to predict the labels of the observations in the unrotated test dataset.
  • Tried to predict the labels of the observation in the rotated test dataset.
  • Applied Principal Component Analysis (PCA) as a means of dimension reduction and then tried to repeat the above steps using the principal components as observations.
  • Repeated the above procedure for the merged train dataset and tried to predict the labels of the observations in the merged test dataset.

Convolutional Neural Network

  • Trained Convolution Neural Network on the unrotated training dataset with manually chosen convolution & pooling layers for the CNN.
  • Tried to select the model yielding best validation accuracy by RandomSearch hyperparameter tuning.
  • Predicted the labels of the obsevations in unrotated test dataset using the best model from RandomSearch.
  • Merged the training and test sets of unrotated and rotated datasets.
  • Applied the previously obtained best selected CNN model on the merged dataset.
  • Predicted the labels of observations in the merged dataset

Results

The following table shows the test accuracies of the different learning algorithms when the models have been trained on data obtained from the unrotated dataset.

Learning Algorithms Test accuracy
Model trained on
given data set
Model trained using
principal components
Unrotated
test data
Rotated
test data
Unrotated
test data
Discriminant
Analysis
LDA 87.22% 9.39% 87.22%
QDA 54.28% 9.97% 13.63%
Decision Tree Complete
decision tree
87.8% 10.26% 80.13%
Cost complexity
pruing
88.31% - -
Random forest 96.87% 10.42% -
Support Vector Machine 97.88% 9.96% 96.85%
Neural Network 97.91% 34.21% 96.98%
Convolutional Neural Network 99.09% - -

The following table shows the test accuracies of the different learning algorithms when the models have been trained on merged obtained from the unrotated and rotated datasets.

Learning Algorithms Test accuracy
Model trained on
given data set
Model trained using
principal components
Discriminant
Analysis
LDA 58.41% 56.76%
QDA 17.125% 14.98%
Decision Tree Complete
decision tree
68.89% -
Cost complexity
pruing
69.42% -
Random forest 87.58% -
Support Vector Machine 90.25% 88.51%
Neural Network 91.54% -
Convolutional Neural Network 96.53% -

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published