Skip to content
Tony Joseph edited this page Feb 19, 2019 · 2 revisions

Welcome to the multi-digit detection and classification using attention Wiki!


About

In this work, we propose a method to detect and classify digits in a sequence from a single image through attention. We used a spatial soft-attention along with attention regularization to improve “where-to-look” aspect of attention. The image is first passed through a Convolutional Neural Network (CNN) to extract the features. Then, the attention mechanism attends to the relevant features in a sequential manner to make predictions. Specifically, at each Recurrent Neural Network (RNN) timestep, the attention mechanism selects specific features to look at. The attention mechanism also has a start and stop state, which tells the network to start looking, and when to stop looking i.e. when no more digits remaining to look at.

Clone this wiki locally