-
Notifications
You must be signed in to change notification settings - Fork 6
State of the art
Dominik edited this page Jun 11, 2015
·
10 revisions
UFC-101 Accuracy (3-Fold) | Notes | |
---|---|---|
Modeling Spatial-Temporal Clues (Wu) | 91.3 | 3 parts: Spatial LSTM, Motion LSTM and Fusion of Spatial/Motion CNN's |
LRCN+CNN (Donahue) | 82.92 | Weighted average of RGB (1/3) and Flow (2/3) networks. LRCN after first fully connected CNN Layer |
2stream CNN (Simonyan) Poster | 88.0 | Temporal + Spatial ConvNet. Fusion using SVM. Multi-task learning for temporal ConvNet. SpatialConv net pre-trained on ILSVRC-2012 and fine-tuning only on last layer. |
LSTM + 30 Frame Unroll (Yue-Hei Ng) | 88.6 | Optical Flow + Image Frames. 1 FPS + Motion information through flow. Re-used GoogLeNet. LSTM performed better than feature pooling architecture. |
Evaluating Two-Stream CNN (Ye) | 87.7 | Takes VGG19 and CNN_M an fine tunes (plus more) |
Slow Fusion (Karpathy) | 65.4 | Trained on 1M sport videos first and then used transfer learning. They used multiresolution CNNs (fovea and context stream) and slow fusion. |