Skip to content

American Sign Language image recognition using Deep Learning. Capstone project for DSCI 591+592.

Notifications You must be signed in to change notification settings

zachcarlson/ASLImageRecognition

Repository files navigation

American Sign Language Image Recognition Using Deep Learning.

Project Overview:

This repository was created for our capstone project in DSCI 591+592, Capstone I & II, for the Master's in Data Science program at Drexel University. The capstone spans two courses. The first course focuses on data acquistion, pre-processing, and EDA. The second course focuses on data training, modeling, and conclusions. Both phases of the project are stored here.

Our project focused on American Sign Language (ASL). The goal of this project was to use deep learning to accurately identify the different hand signs of the ASL alphabet. We opted to create our own dataset for this project, as most ASL datasets available have incorrect hand signs. We excluded any hand signs that require motion (such as the letters J and Z).

File Manifest:

  • Folder /data - Contains all small data files
    • letter_means_df.csv - Mean pixel values for each letter
  • Folder /documents - Contains all reports as required for DSCI 591 and DSCI 592.
    • Launch Report - Final Version.pdf - PDF of first report for DSCI 591.
    • DSCI 591 Data Acquisition and Pre-Processing Report.pdf - PDF of second report for DSCI 591.
    • DSCI 591 Exploratory Data Analytics Report.docx - Third report for DSCI 591.
    • Pitch.pptx - Slides for first presentation in DSCI 591.
    • Status Report.pptx - Slides for second presentation in DSCI 591.
    • DSCI 592 Predictive Modeling Report.pdf - PDF for report in DSCI 592.
    • G5-PMR-Slides.pptx - Slides for first presentation in DSCI 592.
    • G5-PMR-Slides-Final.pptx - Slides for second presentation in DSCI 592.
    • G5-PMR-Final.docx - Final report of project for DSCI 592.
  • Folder /figures - Contains all relevant figures created for the project
    • Folder /FeatureMeansSummary - Contains all summary figures for each letter
      • StandardDeviationOfAllClasses.png - Composite image of standard deviations of each pixel mean value across each letter. (See ASLImageRecognition.ipynb for more information.)
    • Folder /cnn - Contains figures from CNN models.
      • cnn_transfer_accuracy.png - Plot of training and validation accuracy across epochs for CNN.
      • cnn_transfer_loss.png - Plot of training and validation loss across epochs for CNN.
    • Folder /knn - Contains KNN confusion matrix for different values of k.
      • k_accuracy_plot.png - Plot of K versus accuracy for KNN, done manually outside of project.
  • Folder /CNN OUTPUT - Contains performance outputs for CNN hyperparamater tuning.
  • Folder /CNN TRANSFER OUTPUT - Contains performance outputs for Transfer CNN.
  • Folder /KNN OUTPUT - Contains performance outputs for KNN.
  • ASLImageRecognition.ipynb - Jupyter Notebook used for some EDA visualizations
  • step01_VideoSplitPreProcess.py - Python script to split raw video footage into series of frames
  • step02_CropPreProcess.py - Python script to center and crop images
  • step03_GrayscalePreProcess.py - Python script to convert images to grayscale
  • step04_CSVConversionPreProcess.py - Python script to convert grayscale images to CSV
  • step05_EDA.py - Python script to produce some EDA visualizations
  • step06_training.py - Python script for training of KNN, CNN, and transfer-CNN.
  • step07_split_images.py - Python script to split and format images for transfer CNN.

Reason for Project:

Deep learning is a very powerful -- and equally dangerous -- tool. Our goal is to use this technology to help more vulnerable populations. Showing examples of data science being used for the betterment of society will inspire others to do so, rather than having newly graduated data scientists fall in the trap of exclusively working for private industry and rarely asking the question: who does this project serve and who does it harm?

The lack of diversity in technology is something we have been aware of for years. While it may be less obvious in some cases, it became all too clear when working on this project. As mentioned above, when researching datasets we soon came to realize most of the datasets were wrong. The letter T hand sign was fully incorrect. Either the thousands of individuals who downloaded the dataset or the hundreds of individuals who completed projects using it either didn't know ASL, didn't think to check if the ASL dataset was correct, or worse, didn't care to.

It's here that a project changes from just a proof-of-concept to something more meaningful. While we may not have answers now, we can at least start to ask "Who does this project serve?" If the dataset is incorrect, how can algorithms that are built using it possibly serve those who are deaf or hard-of-hearing?

An accurate ASL hand sign identification tool can serve both those who need this language and those who would like to learn. Identifying the alphabet is the first step to identifying words, sentences, and translating ASL in real-time. At the very least, this project -- and our newly created dataset -- can be used as a first step to implementing a tool that checks hand signs through a web, say, for an individual who is trying to learn ASL remotely.

Team Members:

Our team consisted of the following individuals (alphabetized by last name):

Project Requirements

  • Access to videos showing different ASL hand signs. (For the time being, you can access ours here, but these may not be availalbe to the public in the future.)
  • IDE to run Python scripts
  • Correct folder organization (see individual .py files and their functions, as well as the How To Execute Notebook section, for more information.)

Python Requirements

  • Python ≥ 3.8.
  • keras version 2.9.0
  • The following modules and packages are required:
    • csv
    • cv2
    • itertools
    • matplotlib
    • numpy
    • os
    • pandas
    • PIL.Image, PIL.ImageOps
    • sklearn
    • sys
    • VGG16

How to Execute Notebook:

NOTE: These instructions are for Windows. You may need to modify the commands for Mac.

  1. Download repository.
  2. Download the videos stored here. We recommend downloading the zip of Videos, unzipping it, and adding it to the repository. You can also create your own 30 second .mp4 clips for each letter (excluding J and Z). Regardless of what you decide, add these videos to the repository with the following file organization:
  /ASLImageRecognition
    /Videos
      A.mp4
      B.mp4
      ..
  1. Open a terminal window in your preferred Python IDE, we recommend Visual Studio Code. The current working directory should be /ImageRecognition for all following commands.

  2. Split videos into images. Running the following command will split the .mp4 in /Videos into images. They'll be saved in uncropped_frames:

python .\step01_VideoSplitPreProcess.py
  1. Crop images. Running the following command will crop the images from 1080x1920 pixels to 224x224 pixels. These images will be saved in cropped_frames:
python .\step02_CropPreProcess.py
  1. Convert images to grayscale. Running the following command will convert the images to grayscale. These images will be saved in gray_frames:
python .\step03_GrayscalePreProcess.py
  1. Convert images to .csv. Running the following command will import all images per class and save them into one single .csv. These .csv files will be saved in csv_files. After step04_CSVConversionPreProcess.py has completed, you will have all necessary files to run the ASLImageRecognition.ipynb Notebook, if you wish.
python .\step04_CSVConversionPreProcess.py
  1. Create EDA visualizations. Running the following command will produce the EDA figures for this project, which will be saved in figures/FeatureMeansSummary:
python .\step05_EDA.py
  1. Create numpy files for KNN. Running the following command will generate the required numpy files for KNN (and save these in numpy_files):
#create numpy files for KNN
python .\step06_training.py DimReduce
  1. Create numpy files for CNN. Running the following command will generate the required numpy files for CNN (and save these in numpy_files):
#create numpy files for CNN
python .\step06_training.py csvToNpy
  1. Run KNN. Now that the required numpy files are made, you can run the first two models. There are several options for KNN to tune the value of K. If you want to run KNN on more than one K value you must include HyperTweak in your command. Regardless of the option you select, confusion matrix figures will be saved in figures/knn Some examples are below:
#Run KNN with default K=10
python .\step06_training.py KNN

#Run KNN with K=3, 5, 10, 15, 20, 30, 40, 50
python .\step06_training.py KNN HyperTweak 3 5 10 15 20 30 40 50

  1. Run CNN. If VisLayers is selected, figures will be saved in figures/cnn/cnn_layers. There are several optoins for CNN to tune various hyperparameters:
#runs CNN with default values of epochs=10, kernel_size=[5,3], dropout=0.2, strides=[5,3] 
#NOTE: First kernel_size and stride values are used in the first convolutional layer, the second values are used in the second convolutional layer.
python .\step06_training.py CNN

#runs CNN with default values and outputs feature map visualizations, saved in figures/cnn/cnn_layers
python .\step06_training.py CNN VisLayers

#runs CNN with hyperparameter loops.  Uses epochs=10, kernel_sizes=[[5,3],[4,4]], dropouts=[.2,.25], strides=[5,3] 
# NOTE: this command requires a lot of RAM memory.  Edit the parameter options on lines 397-398 to reduce space required.
python .\step06_training.py CNN HPLoop
  1. Copy Images for Transfer CNN. Transfer CNN requires images for training, testing, and validation are in different folders. Running the following command will reconfigure image organization:
python .\step07_split_images.py
  1. Run Transfer CNN. The following command will run Transfer CNN:
python .\step06_training.py CNN_Transfer

Known Limitations of Project:

  • ASL is not universal. ASL differs from other English-speaking countries, with the existence of British Sign Language. Because of this, this project can only serve those who know ASL.

  • Little testing. As this is a new dataset, it has not be tested by the data science community at large. Because of this, we have no benchmark to compare our own results to nor the ability to have others assess the quality of our dataset/provide suggestions . In time, however, this dataset will become available on Kaggle once it has been sufficiently pre-processed.

  • Skin color. While we improved the dataset so it could better serve those who are deaf and hard-of-hearing, we did not tackle the challenge of different skin color. The dataset is composed of images from a single individual. Adding additional images that feature many individuals with different skin color may lower the performance of our dataset by introducing more variety. More importantly, it may improve the average performance across all skin colors and make the dataset (and subsequently created tools) more accessible.

About

American Sign Language image recognition using Deep Learning. Capstone project for DSCI 591+592.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •