Skip to content

Text Summarization Tool using Word Frequency Algorithm

Notifications You must be signed in to change notification settings

jaf107/Text-Summarization-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Summarization Tool (R&D Project for SPL 1 Course)

This repository contains a Text Summarization Tool implemented in Java, which was developed as a research and development (R&D) project for the Software Product Development (SPL 1) course.

Introduction

Text summarization is the process of generating a concise and coherent summary of a longer text while preserving its key information. This tool provides three different extractive approaches to perform text summarization.

R&D Project for SPL 1 Course

The primary objective of this research and development endeavor was to explore the field of text summarization, implement three different algorithms, and compare their performance.

Summarization Approaches

The tool utilizes the following extractive approaches for text summarization:

  1. Word Frequency Algorithm: This approach ranks sentences based on the frequency of important words in each sentence. The sentences with higher frequencies of essential words are considered more relevant and are included in the summary.

  2. Text Rank Algorithm: Inspired by the PageRank algorithm used by Google, Text Rank treats each sentence as a node in a graph. The algorithm evaluates the importance of sentences by considering the relationships between them, and the most significant sentences are selected to form the summary.

  3. Direct Method Algorithm: The Direct Method Algorithm uses a scoring function to rank sentences. The scores are calculated based on different features, such as sentence length, position in the text, and keyword frequency. The top-scoring sentences are chosen for the summary.

Performance Evaluation

To assess the effectiveness of the three algorithms, performance evaluation is conducted using the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric. ROUGE is a widely used metric in natural language processing and summarization tasks, which measures the quality of a summary by comparing it to one or more reference summaries.

The tool generates summaries using each algorithm and compares them against reference summaries using ROUGE scores. This evaluation allows us to gain insights into the strengths and weaknesses of each algorithm and determine which one performs better for different types of texts and summarization requirements.

How to Use the Tool

To use the Text Summarization Tool, you can follow these steps:

  1. Clone the repository to your local machine.
git clone https://github.com/jaf107/text-summarization-tool.git
  1. Open the Java project in your preferred IDE.

  2. Run the main program, providing the input text that you want to summarize.

  3. The program will process the input text using the three extractive summarization algorithms.

  4. Finally, the program will display the summaries generated by each algorithm along with their ROUGE scores, allowing you to compare their performance.

Additional Resources

For a deeper understanding of text summarization and the algorithms used in this tool, you can refer to the following resources:

  1. Text Summarization in 5 Steps using NLTK

  2. Introduction to Text Summarization using TextRank in Python

  3. YouTube Video: Text Summarization with the Direct Method Algorithm

Feel free to contribute to the repository by adding new features, improving existing algorithms, or suggesting enhancements.

Happy text summarization!

About

Text Summarization Tool using Word Frequency Algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published