Skip to content

Online demo of the paper "Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality" published in the Second Scholarly Document Processing Workshop at NAACL-HLT 2021.

License

Notifications You must be signed in to change notification settings

jarobyte91/auto_summ

Repository files navigation

Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality

This repository implements an online demo of the paper Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality published in the Second Scholarly Document Processing Workshop (SDProc 2021) at NAACL-HLT 2021.

Usage

Starting the server

This project is based on the Flask framework. Detailed explanations about how to use Flask and the configuration files of this project can be found in the excellent Mega Flask tutorial. To start the server, you just need to run the following command in the root folder of this project:

flask run

Using as a library

Assuming you have your whole document in a single string, using

from auto_summ.engine.core.engine_summarization import algorithm

centralities = algorithm(text)

will parse it into sentences, compute the centrality of each one of them according to the algorithm described in the paper and give you back a Pandas dataframe with the following columns:

  • sentence, which contains the sentences found in your document.
  • centrality, which contains the relevance score (essentially the degree centrality) of each one of the sentences.

Features

  • A detailed sentence tokenization process based on regular expressions than can accurately handle most cases found in scientific literature.
  • As opposed to the implementation of the paper, this online implementation runs on TF-IDF embeddings for the sake of speed and efficiency. You can easily change this to any of the pre-trained language models found in https://www.sbert.net/.

Installation

git clone https://github.com/jarobyte91/auto_summ.git
cd auto_summ
pip install -r requirements.txt

Support

Feel free to send an email to [email protected] or contact me through any of my social media.

Contribute

Feel free to use the Issue Tracker or Pull Requests of this repository.

License

This project is licensed under the MIT License.

About

Online demo of the paper "Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality" published in the Second Scholarly Document Processing Workshop at NAACL-HLT 2021.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages