NLVL_DETR: Neural Network Architecture

NLVL_DETR (Natural Language Video Localization Detection Transformer) is a neural network designed for the task of localizing moments in videos based on natural language queries. This architecture integrates both video and text processing modules, and uses a Transformer-based encoder-decoder mechanism to predict the span of the relevant video segment.

+--------------+    +--------------------+    +---------------------+    +---------------------+         +---------------------+    +-----------------+
|              |    |                    |    |      Kmeans or      |    |                     | context |                     |    |                 |
| Video Frames +--->| Vision Transformer +--->| Positional Encoding +--->| Transformer Encoder +-------->| Transformer Decoder +--->| Span Prediction |
|              |    |                    |    |                     |    |                     |         |                     |    |                 |
+--------------+    +--------------------+    +---------------------+    +---------------------+         +---------------------+    +-----------------+
                                                                                                                     ^
+------------+    +-------+                                                                                          |
|            |    |       |                        input sequence                                                    |
| Text Query +--->| Phi-2 +-------------------------------------------------------------------------------------------
|            |    |       |
+------------+    +-------+

To view training/eval loss metrics, run tensorboard --logdir results

Link to the Charades-STA dataset. "Data (scaled to 480p, 13GB)" was used for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
environment		environment
results		results
src		src
.gitignore		.gitignore
CS231N_Project_Final_Report.pdf		CS231N_Project_Final_Report.pdf
README.md		README.md
aml_requirements.txt		aml_requirements.txt
aml_submit.py		aml_submit.py
length_hist.png		length_hist.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLVL_DETR: Neural Network Architecture

About

Releases

Packages

Languages

prathikr/nlvl-detr

Folders and files

Latest commit

History

Repository files navigation

NLVL_DETR: Neural Network Architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages