gesture-recognizer.txt

1) What does the recognizer do?
The $p recognizer tries to find the most probable gesture in its known template gesture point clouds for the given input point cloud and returns the template with the smallest distance as the classification result [1]. It is based on the Hungarian Algorithm used in graph theory but uses a simpler "greedy heuristic" to be more in line with its predecessors in the "$-family".
The algorithm is quite similar to the 1$ recognizer and performs the same preprocessing/normalizing steps excluding the rotation step: at first the candidate point cloud as well as all template point clouds are resampled, scaled and translated to the origin. Then the algorithm iterates over all templates to find the one with the minimum matching distance. For this it iterates over all points in the candidate point cloud C per template and computes a weighted euclidean distance for each point in C with every point in T that has no match yet to find the closest point in T for the current point. The weight (starting at 1.0) denotes the confidence for this match and decreases with each point in C as there are less points in the template cloud that haven't been matched yet and therefore the chance that this one is actually the closest point in T overall decreases as well. As the direction of the matching (i.e. if the algorithm searches for the closest point in T for each point in C or vice versa) is important as well, this step is computed for both directions and the minimum of these two is returned. In the end the template point cloud with the minimum distance computed this way is the one that represents the most probable gesture.


2) Name an advantage of the $P recognizer over the $1 recognizer.
The biggest advantage of the $P recognizer is the fact that it represents the drawn gestures as "time-free point clouds" instead of time-ordered unistrokes (or multistrokes). Previous recognizers of the "$ family" like the $N recognizer had to create and store all possible permutations of a gesture based on the number of strokes it can be drawn with and their order so they were still be able to recognize them as the same gesture if gestures were drawn differently. In the case of the $1 recognizer the same gesture couldn't be recognized at all if it wasn't drawn like this in the existing templates. The use of point clouds makes it possible to recognize such gestures correctly without having to store all possible permutations and therefore improves the amount of memory needed as well as the execution time [1]. This means that in comparison to the 1$ recognizer the exact draw order isn't important anymore which makes the $P recognizer far more flexible.


3) What is the minimum matching distance?
The minimum matching distance defines the "goodness" of a match between two point clouds C and T (i.e. the drawn candidate and a template). Therefore, the template T with the minimum matching distance is the most probable gesture to be returned for candidate C.


Sources:
[1] Vatavu, R. D., Anthony, L., & Wobbrock, J. O. (2012, October). Gestures as point clouds: a $ P recognizer for user interface prototypes. In Proceedings of the 14th ACM international conference on Multimodal interaction (pp. 273-280).