Efficient Generation of Summary Videos from Long-form Content

By utilizing advanced Natural Language Processing (NLP) pipelines and state-of-the-art video editing techniques, this project produces concise and informative summaries that effectively capture the essence of the original content through extractive summarization.

Introduction

This project aims to generate summary videos from long YouTube videos by following these steps:

Downloading a long YouTube video.
Transcribing the video into text (English only).
Summarizing the transcribed text using extractive summarization.
Extracting timestamps of each sentence in the summary paragraph.
Merging adjacent timestamps to reduce the number of video segments.
Segmenting the video using the extracted timestamps.
Merging all the extracted video segments to create the final summary video.

Installation

To set up the environment for this project, follow these steps:

Install the required Python packages:
```
pip install -r requirements.txt
```
Download the SpaCy model:
```
python -m spacy download en_core_web_lg
```

Usage

To run the project, follow these steps:

Import the necessary libraries and install additional packages if required:

!pip install youtube-transcript-api
!pip install langdetect
!pip install pytube
!pip install spacy
!pip install pytextrank
!python3 -m spacy download en_core_web_lg

Match the summary paragraph in the transcribed JSON list:

matched_json = match_text_in_json(paragraph, json_list)

Merge overlapping segments:

def merge_dicts(input_list):
    output_list = []
    n = len(input_list)
    i = 0

    while i < n:
        current_segment = input_list[i]
        j = i + 1

        while j < n and current_segment['start'] + current_segment['duration'] + 1 >= input_list[j]['start']:
            current_segment['duration'] = input_list[j]['start'] - current_segment['start'] + input_list[j]['duration']
            j += 1

        output_list.append({'start': current_segment['start'], 'duration': current_segment['duration']})
        i = j

    return output_list

merged_matched_json = merge_dicts(matched_json)

Download the YouTube video:

from pytube import YouTube

yt = YouTube(url)
selected_stream = yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').first()
selected_stream.download(filename='original_video.mp4')

Segment and merge the video:

from moviepy.editor import VideoFileClip, concatenate_videoclips

original_video = VideoFileClip("original_video.mp4")
video_segments = merged_matched_json

clips = []
for segment in video_segments:
    start = segment['start']
    duration = segment['duration']
    clip = original_video.subclip(start, start + duration)
    clips.append(clip)

summary_video = concatenate_videoclips(clips)
summary_video.write_videofile("summary_video.mp4", codec="libx264")

Results and Conclusions

The project successfully generates concise summary videos by filtering transcripts and extracting relevant segments. This approach significantly reduces the time required to understand the main points of long videos, enhancing accessibility and usability for a wide range of audiences.

Limitations and Future Scope

The project has limitations, such as the loss of continuity when cutting and merging video segments. Since extractive summarization techniques are an active area of research, this project may not provide state-of-the-art results in the future due to the rapid advancements in AI. Therefore, it is recommended to replace the current model pipeline with newer state-of-the-art models as they become available.

In the future, the project can be improved by leveraging AI video generation models to create video segments based on the audio and original video segments, thereby preserving continuity. Some of the open-source video generation models currently being implemented as of June 2024 include OpenAI's SORA and CogVideo, although these models are currently only available for use in China.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Original File

For detailed project information, refer to the original file located here.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Input_video.mp4		Input_video.mp4
LICENSE		LICENSE
Output_video.mp4		Output_video.mp4
README.md		README.md
main_notebook.ipynb		main_notebook.ipynb
requirements.txt		requirements.txt
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Generation of Summary Videos from Long-form Content

Introduction

Installation

Usage

Results and Conclusions

Limitations and Future Scope

License

Original File

About

Releases

Packages

Languages

License

bojadlabalaji/Enhanced-Video-Summarization

Folders and files

Latest commit

History

Repository files navigation

Efficient Generation of Summary Videos from Long-form Content

Introduction

Installation

Usage

Results and Conclusions

Limitations and Future Scope

License

Original File

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages