This code is a Python implementation of text summarization using natural language processing (NLP) techniques. The code takes in a block of text and generates a summary of the text by selecting the most important sentences.
Before running the code, ensure that you have the necessary libraries installed. The code requires the following libraries:
- spacy
- string
To install the spacy library, run the following command:
!pip install -U spacy
You will also need to download the English language model for spacy by running the following command:
!python -m spacy download en_core_web_sm
The code can be run using any Python IDE or the command line.
- Import the necessary libraries:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
- Load the spacy language model:
nlp = spacy.load('en_core_web_sm')
- Process the input text:
doc = nlp(text)
- Tokenize the text and remove stop words and punctuation:
stopwords = list(STOP_WORDS)
punctuation = punctuation + '\n'
word_frequencies = {}
for word in doc:
if word.text.lower() not in stopwords:
if word.text.lower() not in punctuation:
if word.text not in word_frequencies.keys():
word_frequencies[word.text] = 1
else:
word_frequencies[word.text] += 1
- Normalize the word frequencies:
max_frequency = max(word_frequencies.values())
for word in word_frequencies.keys():
word_frequencies[word] = word_frequencies[word]/max_frequency
- Tokenize the text into sentences:
sentence_tokens = [sent for sent in doc.sents]
- Calculate the sentence scores:
sentence_scores = {}
for sent in sentence_tokens:
for word in sent:
if word.text.lower() in word_frequencies.keys():
if sent not in sentence_scores.keys():
sentence_scores[sent] = word_frequencies[word.text.lower()]
else:
sentence_scores[sent] += word_frequencies[word.text.lower()]
- Generate the summary:
select_length = int(len(sentence_tokens)*0.3)
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)
final_summary = [word.text for word in summary]
summary = ' '.join(final_summary)
This code is released under the MIT License.