Skip to content

prathikr/nlvl-detr

Repository files navigation

NLVL_DETR: Neural Network Architecture

NLVL_DETR (Natural Language Video Localization Detection Transformer) is a neural network designed for the task of localizing moments in videos based on natural language queries. This architecture integrates both video and text processing modules, and uses a Transformer-based encoder-decoder mechanism to predict the span of the relevant video segment.

+--------------+    +--------------------+    +---------------------+    +---------------------+         +---------------------+    +-----------------+
|              |    |                    |    |      Kmeans or      |    |                     | context |                     |    |                 |
| Video Frames +--->| Vision Transformer +--->| Positional Encoding +--->| Transformer Encoder +-------->| Transformer Decoder +--->| Span Prediction |
|              |    |                    |    |                     |    |                     |         |                     |    |                 |
+--------------+    +--------------------+    +---------------------+    +---------------------+         +---------------------+    +-----------------+
                                                                                                                     ^
+------------+    +-------+                                                                                          |
|            |    |       |                        input sequence                                                    |
| Text Query +--->| Phi-2 +-------------------------------------------------------------------------------------------
|            |    |       |
+------------+    +-------+

To view training/eval loss metrics, run tensorboard --logdir results

Link to the Charades-STA dataset. "Data (scaled to 480p, 13GB)" was used for this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published