Releases · Capstone-Projects-2020-Spring/iASL-Backend

Overview

This repository is for the iASL iOS application. The app will use the iOS device's front-facing camera to record the user signing in ASL. It will then attempt to convert the ASL to text by using machine learning. The application will also support a speech-to-text feature to allow Deaf individuals to read what the other person is saying aloud.

Users will be able to save the ASL to text as a note. iASL also features an instant messaging feature. Users can sign in front of the camera, or speak into the microphone, and have the resulting text be sent to someone else.

Usage

The main code for running the training portion of the model can be found in the scripts folder. The main script, train.sh, runs the training network and saves the model to the folder output/p1_train. The training process results in three files being saved, as well as one additional weights file for every epoch.

Realtime Detection

The realtime-detection.py script only takes a parameter file. In this file should be a model directory (relative to the iASL_OUT directory) where the model and weights are stored. This script will load these files to use as the model. On start-up, there will be a green box in the top left of the screen. Place your hand there within the first three seconds. This will activate the object tracking so the region of interest will follow your hand. Each frame displayed will be sent to the model for classsification. On detection, the confidence and label will be displayed in a black box. Running detect.sh will provide the parameter file and source the runtime file, so it is suggested to run the bash file as opposed to running the python script directly.

Video Classification Model

In the containerized directory, there is an environment for deploying a video classifier. This directory includes a Dockerfile that allows for building a Docker image that can be deployed.

Example:
docker build -t tf-server .
docker run -d -p 8080 tf-server

This is possible through the use of Flask in the app.py script. It exposes port 8080 and takes in requests through the /predict route. This POST method takes in an argument vid_stuff which takes in a Base64 encoded string of 40 50x50x3 images. The shape must exactly match a flattened 40x50x50x3 array. These inputs are resized back up to 40x150x150x3. In the mdl_dir directory, there exists a JSON file containing the model architecture and an hdf5 file with the latest weights. One can load this model up to train on their own data. The input to the model is a sequence of 40x150x150x3. This uses the Inception3D model, which uses 3D-Convolutional neural networks. Currently, the top layer of this model has 55 output nodes, due to only 55 words that were trained on. To increase the number of output nodes, a new Inception3D model must be instantiated and a new top layer must be put in place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Usage

Realtime Detection

Video Classification Model

Releases: Capstone-Projects-2020-Spring/iASL-Backend

Capstone Final Release

Overview

Usage

Realtime Detection

Video Classification Model